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Abstract 

Quantum annealing is a generic name of quantum algorithms to use quantum-mechanical 
fluctuations to search for the solution of optimization problem. It shares the basic idea 
with quantum adiabatic evolution studied actively in quantum computation. The present 
paper reviews the mathematical and theoretical foundation of quantum annealing. In par- 
ticular, theorems are presented for convergence conditions of quantum annealing to the 
target optimal state after an infinite-time evolution following the Schrodinger or stochas- 
tic (Monte Carlo) dynamics. It is proved that the same asymptotic behavior of the 
control parameter guarantees convergence both for the Schrodinger dynamics and the 
stochastic dynamics in spite of the essential difference of these two types of dynamics. 
Also described are the prescriptions to reduce errors in the final approximate solution 
obtained after a long but finite dynamical evolution of quantum annealing. It is shown 
there that we can reduce errors significantly by an ingenious choice of annealing schedule 
(time dependence of the control parameter) without compromising computational com- 
plexity qualitatively. A review is given on the derivation of the convergence condition 
for classical simulated annealing from the view point of quantum adiabaticity using a 
classical-quantum mapping. 

1 Introduction 

An optimization problem is a problem to minimize or maximize a real single-valued 
function of multivariables called the cost function [HE]- If the problem is to maximize 
the cost function /, it suffices to minimize — /. It thus does not lose generality to consider 
minimization only. In the present paper we consider combinatorial optimization, in which 
variables take discrete values. Well-known examples are satisfiability problems (SAT), 
Exact Cover, Max Cut, Hamilton graph, and Traveling Salesman Problem. In physics, 
the search of the ground state of spin systems is a typical example, in particular, systems 
with quenched randomness like spin glasses. 

Optimization problems are classified roughly into two types, easy and hard ones. 
Loosely speaking, easy problems are those for which we have algorithms to solve in 
steps(=time) polynomial in the system size (polynomial complexity). In contrast, for 
hard problems, all known algorithms take exponentially many steps to reach the exact 
solution (exponential complexity). For these latter problems it is virtually impossible 
to find the exact solution if the problem size exceeds a moderate value. Most of the 
interesting cases as exemplified above belong to the latter hard class. 

It is therefore important practically to devise algorithms which give approximate but 
accurate solutions efficiently, i.e. with polynomial complexity. Many instances of com- 
binatorial optimization problems have such approximate algorithm. For example, the 
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Lin-Kernighan algorithm is often used to solve the traveling salesman problem within a 
reasonable time [3j. 

In the present paper we will instead discuss generic algorithms, simulated annealing 
(SA) and quantum annealing (QA). The former was developed from the analogy be- 
tween optimization problems and statistical physics [4j[5]. In SA, the cost function to be 
minimized is identified with the energy of a statistical-mechanical system. The system 
is then given a temperature, an artificially-introduced control parameter, by reducing 
which slowly from a high value to zero, wc hope to drive the system to the state with 
the lowest value of the energy (cost function), reaching the solution of the optimization 
problem. The idea is that the system is expected to stay close to thermal equilibrium 
during time evolution if the rate of decrease of temperature is sufficiently slow, and is 
thus lead in the end to the zero-temperature equilibrium state, the lowest-energy state. 
In practical applications SA is immensely popular due to its general applicability, reason- 
able performance, and relatively easy implementation in most cases. SA is usually used 
as a method to obtain an approximate solution within a finite computation time since it 
needs an infinitely long time to reach the exact solution by keeping the system close to 
thermal equilibrium. 

Let us now turn our attention to quantum annealing [51 [71 [HI [SI [101 E] El • In SA, we 
make use of thermal (classical) fluctuations to let the system hop from state to state over 
intermediate energy barriers to search for the desired lowest-energy state. Why then not 
try quantum-mechanical fluctuations (quantum tunneling) for state transitions if such 
may lead to better performance? In QA we introduce artificial degrees of freedom of 
quantum nature, non-commutative operators, which induce quantum fluctuations. We 
then ingeniously control the strength of these quantum fluctuations so that the system 
finally reaches the ground state, just like SA in which we slowly reduce the temperature. 
More precisely, the strength of quantum fluctuations is first set to a very large value for 
the system to search for the global structure of the phase space, corresponding to the 
high-temperature situation in SA. Then the strength is gradually decreased to finally 
vanish to recover the original system hopefully in the lowest-energy state. Quantum 
tunneling between different classical states replaces thermal hopping in SA. The physical 
idea behind such a procedure is to keep the system close to the instantaneous ground state 
of the quantum system, analogously to the quasi-equilibrium state to be kept during the 
time evolution of SA. Similarly to SA, QA is a generic algorithm applicable, in principle, to 
any combinatorial optimization problem and is used as a method to reach an approximate 
solution within a given finite amount of time. 

The reader may wonder why one should invent yet another generic algorithm when we 
already have powerful SA. A short answer is that QA outperforms SA in most cases, at 
least theoretically. Analytical and numerical results indicate that the computation time 
needed to achieve a given precision of the answer is shorter in QA than in SA. Also, the 
magnitude of error is smaller for QA than SA if we run the algorithm for a fixed finite 
amount of time. We shall show some theoretical bases for these conclusions in this paper. 
Numerical evidence is found in[9l[10l[IIl[H[Tll[l6l[l7l[l8l[l9l[20ll2ll|22]. 

A drawback of QA is that a full practical implementation should rely on the quantum 
computer because we need to solve the time-dependent Schrodinger equation of very 
large scale. Existing numerical studies have been carried out either for small prototype 
examples or for large problems by Monte Carlo simulations using the quantum-classical 
mapping by adding an extra (Trotter or imaginary-time) dimension [231 \24\ I25j . The 

^ The term quantum annealing first appeared in [121 [13] , in which the authors used quantum transitions for 
state search and the dynamical evolution of control parameters were set by hand as an algorithm. Quantum 
annealing in the present sense using natural Schrodinger dynamics was proposed later independently in [6] 
and [Zj. 
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latter mapping involves approximations, which inevitably introduces additional errors as 
well as the overhead caused by the extra dimension. Nevertheless, it is worthwhile to 
clarify the usefulness and limitations of QA as a theoretical step towards a new paradigm 
of computation. This aspect is shared by quantum computation in general whose practical 
significance will be fully exploited on the quantum computer. 

The idea of QA is essentially the same as quantum adiabatic evolution (QAE), which 
is now actively investigated as an alternative paradigm of quantum computation [26 . It 
has been proved that QAE is equivalent to the conventional circuit model of quantum 
computation [27], but QAE is sometimes considered more useful than the circuit model 
for several reasons including robustness against external disturbance. In the literature 
of quantum computation, one is often interested in the computational complexity of the 
QAE-based algorithm for a given specific problem under a fixed value of acceptable error. 
QAE can also be used to find the final quantum state when the problem is not a classical 
optimization. 

In some contrast to these situations on QAE, studies of QA are often focused not on 
computational complexity but on the theoretical convergence conditions for infinite-time 
evolution and on the amount of errors in the final state within a fixed evolution time. 
Such a difference may have lead some researchers to think that QA and QAE are to be 
distinguished from each other. We would emphasize that they are essentially the same 
and worth investigations by various communities of researchers. 

The structure of the present paper is as follows. Section [2] discusses the convergence 
condition of QA, in particular the rate of decrease of the control parameter representing 
quantum fluctuations. It will be shown there that a qualitatively faster decrease of the 
control parameter is allowed in QA than in SA to reach the solution. This is one of the 
explicit statements of the claim more vaguely stated above that QA outperforms SA. 
In Sec. [3] we review the performance analysis of SA using quantum-mechanical tools. 
The well-known convergence condition for SA will be rederived from the perspective 
of quantum adiabaticity. The methods and results in this section help us strengthen 
the interrelation between QA, SA and QAE. The error rate of QA after a finite-time 
dynamical evolution is analyzed in Sec. [4l There we explain how to reduce the final 
residual error after evolution of a given amount of time. This point of view is unique in 
the sense that most references of QAE study the time needed to reach a given amount of 
tolerable error, i.e. computational complexity. The results given in this section can be 
used to qualitatively reduce residual errors for a given algorithm without compromising 
computational complexity. Convergence conditions for stochastic implementation of QA 
are discussed in Sec. [S] The results are surprising in that the rate of decrease of the 
control parameter for the system to reach the solution coincides with that found in Sec. [2] 
for the pure quantum-mechanical Schrodinger dynamics. The stochastic (and therefore 
classical) dynamics shares the same convergence conditions as fully quantum dynamics. 
Summary and outlook are described in the final section. 

The main parts of this paper (Sees. [2l |4] and [5]) are based on the PhD Thesis of one of 
the authors (S.M.) [28] as well as several original papers of the present and other authors 
as will be referred to appropriately. The present paper is not a comprehensive review 
of QA since an emphasis is given almost exclusively to the theoretical and mathematical 
aspects. There exists an extensive body of numerical studies and the reader is referred 
to [3 [lOl [11] for reviews. 
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2 Convergence condition of QA — Real-time Schrodinger 
evolution 



The convergence condition of QA with the real-time Schrodinger dynamics is investigated 
in this section, following f29l. We first review the proof of the adiabatic theorem [30 to 
be used to derive the convergence condition. Then introduced is the Ising model with 
transverse field as a simple but versatile implementation of QA. The convergence condition 
is derived by solving the condition for adiabatic transition with respect to the strength 
of the transverse field. 



2.1 Adiabatic theorem 

Let us consider the general Hamiltonian which depends on time t only through the di- 
mensionless time s = t/r, 

Hit)=H(^^^^His). (1) 

The parameter r is introduced to control the rate of change of the Hamiltonian. In natural 
quantum systems, the state vector \ip{t)) follows the real-time Schrodinger equation, 

i^^\m) = Hit)m)), (2) 

or, in terms of the dimensionless time, 

i±\i,{s))^rHis)\m), (3) 

where we set h — 1. We assume that the initial state is chosen to be the ground state 
of the initial Hamiltonian H{0) and that the ground state of H{s) is not degenerate for 
s > 0. We show in the next section that the transverse- field Ising model, to be used as 
H{t) in most parts of this paper, has no degeneracy in the ground state (except possibly in 
the limit of i — > oo). If r is large, the Hamiltonian changes slowly and it is expected that 
the state vector keeps track of the instantaneous ground state. The adiabatic theorem 
provides the condition for adiabatic evolution. To see this, we derive the asymptotic form 
of the state vector with respect to the parameter r. 

Since we wish to estimate how close the state vector is to the ground state, it is natural 
to expand the state vector by the instantaneous eigenstates of H{s). Before doing so, we 
derive useful formulas for the eigenstates. The kth instantaneous eigenstate of H{s) with 
the eigenvalue efc(s) is denoted as \k{s)), 

His)\ki.s))=Ski.s)\ki.s)). (4) 

We assume that |0(s)) is the ground state of H{s) and that the eigenstates are orthonor- 
mal, (j(s)l k{s)) = Sjk- From differentiation of ^ with respect to s, we obtain 

(j(s)i f \m) = , , , o-(^)i iM^)> , (5) 

as ^j[s) — £k[s) as 

where j ^ k. In the case of j = k, the same calculation does not provide any meaningful 
result. We can, however, impose the following condition, 

{k{s)\^^\k{s))^Q. (6) 
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This condition is achievable by the time-dependent phase shift: If |fc(s)) — e'^*-"' \k{s)), 
we find 

{k{s)\^J~k{s))^i^^+{k{s)\^^\k{s)). (7) 
The second term on the right-hand side is purely imaginary because 



{k{s)\-\k{s)) 



+ (fc(,)|^|fc(,))^^(fc(,)|fc(,))^0. 



(8) 



Thus, the condition ([6]) can be satisfied by tuning the phase factor 9{s) even if the original 
eigenstate does not satisfy it. 

Theorem 2.1. // the instantaneous ground state of the Hamiltonian H(s) is not degen- 
erate for s > and the initial state is the ground state at s = 0, i.e. |V'(0)) = |0(0)), the 
state vector |^/'(s)) has the asymptotic form in the limit of large r as 



ms)) =Y^c,{s)e-' 



iT0j(s) 



(9) 



co{s)^1 + 0{t-^), 
c,^o(s) ~ - Uj(0) -e'^['^^(^)-*«(^)U,(s)l +0(r-2), 

T L J 

where (f>j{s) = ds'ej{s') , Aj(s) = ej{s) — £o(s) and 

1 



A,{s 



^0-(^)l^|0(^)). 



(10) 

(11) 

(12) 



Proof. Substitution of ([9]) into the Schrodinger equation ([3]) yields the equation for the 
coefficient Cj{s) as 



dc^ 
ds 



ds 



where we used (O and (O- Integration of this equation yields 

c,{s)=c,{0) + Yl / dgcfc(g) ^ (^■(g)|i_i£2|fc(g)) 



£j(S) -efe(S) 



(13) 



(14) 



Since the initial state is chosen to be the ground state of H{Q), co(0) = 1 and Cj^o(O) — 0. 
The second term on the right-hand side is of the order of r^^ because its integrand rapidly 
oscillates for large r. In fact, the integration by parts yields the T~^-factor. Thus, Cj^o{0) 
is of order at most. Hence only the k = term in the summation remains up to the 
order of r^^, 



Cj#o(s) 



^ir[0j(s)-0o(s)] 



A,(S) 



(.(^11^ 



|0(S))+O(7 



and the integration by parts yields (|lip . 



(15) 



□ 



Remark. The condition for the adiabatic evolution is given by the smallness of the excita- 
tion probability. That is, the right-hand side of (|lip should be much smaller than unity. 
This condition is consistent with the criterion of the validity of the above asymptotic 
expansion. It is represented by 

T»|v4,(s)|. (16) 
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Using the original time variable t, this adiabaticity condition is written as 



= <5< 1. 



(17) 



This is the usual expression of adiabaticity condition. 



2.2 Convergence conditions of quantum annealing 

In this section, we derive the condition which guarantees the convergence of QA. The 
problem is what annealing schedule (time dependence of the control parameter) would 
satisfy the adiabaticity condition ^T7\ . We solve this problem on the basis of the idea of 
Somma et al |3T] developed for the analysis of SA in terms of quantum adiabaticity as 
reviewed in Sec. [S] 



2.2.1 Transverse field Ising model 

Let us suppose that the optimization problem we wish to solve can be represented as the 
ground-state search of an Ising model of general form 

N 

^^Ising = - X! •^^'^^ ^ X! ■^'J'^i'^j ~ XI ■^Vk^^t^j^k > (18) 

i=l ij ijk 

where the erf (a = x, y, z) are the Pauli matrices, components of the spin i operator at 
site i. The eigenvalue of erf is +1 or —1, which corresponds the classical Ising spin. Most 
combinatorial optimization problems can be written in this form by, for example, mapping 
binary variables (0 and 1) to spin variables (±1). Another important assumption is that 
the Hamiltonian (|18p is extensive, i.e. proportional to the number of spins N for large 
N. 

To realize QA, a fictitious kinetic energy is introduced typically by the time-dependent 
transverse field 

N 

HMt)^~mY,a^, (19) 

which induces spin flips, quantum fluctuations or quantum tunneling, between the two 
states erf = 1 and erf = —1, thus allowing a quantum search of the phase space. Ini- 
tially the strength of the transverse field T{t) is chosen to be very large, and the total 
Hamiltonian 

H(t) -i?Mng + ifTF(i) (20) 

is dominated by the second kinetic term. This corresponds to the high-temperature 
limit of SA. The coefhcient r{t) is then gradually and monotonically decreased toward 
0, leaving eventually only the potential term i?ising- Accordingly the state vector \ip{t)), 
which follows the real-time Schrodinger equation, is expected to evolve from the trivial 
initial ground state of the transverse-field term to the non-trivial ground state of 
(|18p . which is the solution of the optimization problem. An important issue is how slowly 
we should decrease r{t) to keep the state vector arbitrarily close to the instantaneous 
ground state of the total Hamiltonian (|20p . The following Theorem provides a solution 
to this problem as a sufficient condition. 

Theorem 2.2. The adiabaticity |i7| j for the transverse-field Ising model \20\) yields the 
time dependence ofT{t) as 

r(t) = a(5t + c)-i/(2^-i) (21) 
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for t > to (for a given positive to) as a sufficient condition of convergence of QA. Here a 
and c are constants ofO{N'^) and S is a small parameter to control adiabaticity appearing 
m (73). 

The following Theorem proved by Hopf [32] will be useful to prove this Theorem. See 
Appendix]^ for the proof. 

Theorem 2.3. // all the elements of a square matrix M are strictly positive, Mij > 0, 
its maximum eigenvalue Aq and any other eigenvalues A satisfy 

|A| < ^Ao, (22) 

where k is defined by 

K = max . (26) 

i,j,k Mjk 

Proof of Theorem 12.21 We show that the power decay (pij) satisfies the adiabaticity 
condition p7p which guarantees convergence to the ground state of -ffising as t 00. . 
For this purpose we estimate the energy gap and the time derivative of the Hamiltonian. 
As for the latter, it is straightforward to see 



< -iV-li, (24) 



since the time dependence of H{t) lies only in the kinetic term iJTF(i), which has N 
terms. Note that dT/dt is negative. 

To estimate a lower bound for the energy gap, we apply Theorem l2.3l to the operator 
M = [E^ — H{t))^ . We assume that the constant S+ satisfies > i?max + Tq, where 
Fq = F(to) and £^max is the maximum eigenvalue of the potential term i?ising- AH the 
elements of the matrix M are strictly positive in the representation that diagonalizes {erf} 
because E+ — H{t) is non-negative and irreducible, that is, any state can be reached from 
any other state within at most N steps. 

For t > to, where T{t) < Fq, all the diagonal elements of E+ — H{t) are larger than any 
non-zero off-diagonal element F(t). Thus, the minimum element of M, which is between 
two states having all the spins in mutually opposite directions, is equal to A^!F(i)^, where 
N\ comes from the ways of permutation to flip spins. Replacement of 7?TF(i) by — iVFo 
shows that the maximum matrix element of M has the upper bound (E^ — Erain + NTo)^ , 
where i?min is the lowest eigenvalue of iyisint^ Thus, we have 



^ < '^^'ZTZJn ■ (25) 



{E+ - E^i^ + iVFo^^ 
N\T{t) 

If we denote the eigenvalue of H{t) by ej{t), (|^^ is rewritten as 



[E+ e,it)f <^[E+- eoit)f . (26) 



Substitution of ([251 into the above inequality yields 

2[E+ - eo{t)]Nl _ 

N{E+ - En,in + NT^ 



^ w > .r.:'^:^ "'. z^.N m^ - ATitr, (27) 



where we used 1 - ((k- !)/(«+ 1))^/^ > 2/N{k+1) for k > 1 and iV > 1. The coefficient 
A is estimated using the Stirling formula as 

^^ 2VSNIE (28) 
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where e"^^^ is inaxt>t(,{eo(i)}. This expression imphes that A is exponentially small for 
large N. 

Now, by combination of the above estimates ((24|) and (p7)) , we find that the sufficient 
condition for convergence for t > to is 

^ = 5 < 1, (29) 



v42r(<)2^ dt 

where S is an arbitrarily small constant. By integrating this differential equation, we 
obtain (gT]). □ 

Remark. The asymptotic power decay of the transverse field guarantees that the exci- 
tation probability is bounded by the arbitrarily small constant 5^ at each instant. This 
annealing schedule is not valid when T(t) is not sufficiently small because we evaluated 
the energy gap for r{t) < Tq {t > tg). If we take the limit ^ Oi Tq increases indefinitely 
and the coefficient a in ([2T|) diverges. Then the result pT|) does not make sense. This is 
the reason why a finite positive time should be introduced in the statement of Theorem 

El 



2.2.2 Transverse ferromagnetic interactions 

The same discussions as above apply to QA using the transverse ferromagnetic interac- 
tions in addition to a transverse field. 



Hti (t) ^ -Fti (t) E < + E « • (30) 




The second summation runs over appropriate pairs of sites that satisfy extensiveness of 
the Hamiltonian. A recent numerical study shows the effectiveness of this type of quantum 
kinetic energy |,18j . The additional transverse interaction widens the instantaneous energy 
gap between the ground state and the first excited state. Thus, it is expected that an 
annealing schedule faster than (pij) satisfies the adiabaticity condition. The following 
Theorem supports this expectation. 

Theorem 2.4. The adiabaticity for the quantum system iJising + H'Yiit) yields the time 
dependence ofT{t) for t > t^ as 

rTi(t)oct-i/(^-i). (31) 

Proof. The transverse interaction introduces non-zero off-diagonal elements to the Hamil- 
tonian in the representation that diagonalizes erf . Consequently, any state can be reached 
from any other state within N/2 steps at most. Thus, the strictly positive operator is 
modified to (£'+ — i?ising — -ffTi(i))^''^, which leads to the lower bound for the energy gap 
as a quantity proportional to rTi(i)^^^. The rest of the proof is the same as Theorem 

o □ 

The above result implies that additional non-zero off-diagonal elements of the Hamil- 
tonian accelerates the convergence of QA. It is thus interesting to consider the many-body 
transverse interaction of the form 

N 

HMTl{t) = -TMTl{t)Y[il + <jf). (32) 
i=l 

All the elements of Hmti are equal to — rMTi(i) in the representation that diagonalizes 
erf. In this system, the following Theorem holds. 
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Theorem 2.5. The adiabaticity for the quantum system i?ising + -ffTMi(i) yields the time 
dependence ofT(t) for t > to as 

rMTi(0 oc . (33) 

t 

Proof. We define the strictly positive operator as M = — iJising ~ -^mtiIO- The 
maximum and minimum matrix elements of M are — ii'min + rMTi(i) and rMTi(i)j 
respectively. Thus we have 



E+ — E^cain + TmTI {t) 

rMTi(i) 



(34) 



K—\_ i?+ — i?niin 2rMTl(i) ^gg^ 

K + 1 i?+ — i?min + 2rMTl(i) ~ -B+ — -Emin 

The inequality for the strictly positive operator ((22)) yields 

A,(0 > '^""f ^ irMTi(t), (36) 

^+ ^ ^min 

where A is ©(A^^). Since the matrix element of the derivative of the Hamiltonian is 
bounded as 



O-Wl^io(t)) 



< _2^^4ill, (37) 
at 



we find that the sufficient condition for convergence with the many-body transverse in- 
teraction is 

<5«1. (38) 



2^ dr. 



A^TMTiity dt 

Integrating this differential equation yields the annealing schedule p3l) . □ 



2.2.3 Computational complexity 

The asymptotic power-low annealing schedules guarantee the adiabatic evolution during 
the annealing process. The power-law dependence on t is much faster than the log-inverse 
law for the control parameter in SA, T{t) = pN/ log(at -I- 1), to be discussed in the next 
section, first proved by Geman and Geman |33j . However, it does not mean that QA 
provides an algorithm to solve NP problems in polynomial time. In the case with the 
transverse field only, the time for r{t) to reach a sufficiently small value e (which implies 
that the system is sufficiently close to the final ground state of -ffising whence Htf is a 
small perturbation) is estimated from (|2T|) as 

. (39) 

This relation clearly shows that the QA needs a time exponential in N to converge. 

For QA with many-body transverse interactions, the exponent of t in the annealing 
schedule ([55)1 does not depend on the system size N. Nevertheless, it also does not 
mean that QA provides a polynomial-time algorithm because of the factor 2^. The 
characteristic time for Fmti to reach a sufficiently small value e is estimated as 

^MTi ~ (40) 



which again shows exponential dependence on TV. 
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These exponential computational complexities do not come as a surprise because 
Theorems 12.21 12.41 and 12.51 all apply to any optimization problems written in the generic 
form (jlSp . which includes the worst cases of most difficult problems. Similar arguments 
apply to SA tMj. 

Another remark is on the comparison of r(t)(cx t^^/'^^"-'^') in QA with T{t){oc 
N/\og{at + 1)) in SA to conclude that the former schedule is faster than the latter. 
The transverse-field coefficient F in a quantum system plays the same role qualitatively 
and quantitatively as the temperature T does in a corresponding classical system at least 
in the Hopfield model in a transverse field 05]. When the phase diagram is written in 
terms of F and a (the number of embedded patterns divided by the number of neurons) 
for the ground state of the model, the result has precisely the same structure as the T-a 
phase diagram of the finite-temperature version of the Hopfield model without transverse 
field. This example serves as a justification of the direct comparison of F and T at least 
as long as the theoretical analyses of QA and SA are concerned. 

3 Convergence condition of SA and quantum adia- 
baticity 

We next study the convergence condition of SA to be compared with QA. This prob- 
lem was originally solved by Geman and Geman [33j using the theory of inhomogeneous 
Markov chain as described in the Quantum Monte Carlo context in Sec. O It is quite 
surprising that their result is reproduced using the quantum adiabaticity condition ap- 
plied after a classical-quantum mapping |31j . This approach is reviewed in this section, 
following [31j . to clarify the correspondence between the quasi-equilibrium condition for 
SA in a classical system and the adiabaticity condition in the corresponding quantum 
system. The analysis will also reveal an aspect related to the equivalence of QA and 
QAE. 

3.1 Classical-quantum mapping 

The starting point is an expression of a classical thermal expectation value in terms of a 
quantum ground-state expectation value. A well-known mapping between quantum and 
classical systems is to rewrite the former in terms of the latter with an extra imaginary- 
time (or Trotter) dimension |24j . The mapping discussed in the present section is a 
different one, which allows us to express the thermal expectation value of a classical 
system in terms of the ground-state expectation value of a corresponding quantum system 
without an extra dimension. 

Suppose that the classical Hamiltonian, whose value we want to minimize, is written 
as an Ising spin system as in ([TSl) : 

N 

i=l ij ijk 

The thermal expectation value of a classical physical quantity Q({o'f }) is 

= W^E^"''''Q(W)' (42) 

where the sum runs over all configurations of Ising spins, i.e. over the values taken by 
the 2-components of the Pauli matrices, af = ai{ztl) (Vi). The symbol {ai} stands for 
the set {cTi, a2, • • • , a^}. 

An important clement is the following Theorem. 
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Theorem 3.1. The thermal expectation value 114 is equal to the expectation value of Q 
by the quantum wave function 

|^(r))=e-'^^/2^|{a,}), (43) 

where \{ai})is the basis state diagonalizing each af as (T;. The sum runs over all such 
possible assignments. 

Assume T > 0. The wave function is the ground state of the quantum Hamilto- 
nian 

H,{T) = -xY^^iiT) ^ -xY.(^i'e^"^), (44) 
j j 

where Hj is the sum of the terms of the Hamiltonian |^ j[ ) involving site j , 



Hj = -Jj^'j - Jokcj'jcrl - ^ Jjki<Jj<J^<Ji . (45) 

fc kl 

The coefficient x is defined by x ~ e^^^ with p — maxj \Hj\. 
Proof. The first half is trivial: 



(^(r)|g|^(r)) 
mrmT)) - z{T) 

To show the second half, wc first note that 



}7f^,Y.^-''"{{^^}\Q\{^^})^{Q)T■ (46) 



since the operator cr^ just changes the order of the above summation. It is also easy to 
see that 

aie-^"'^ = e^"'e-P"'^ai (48) 

because 

as both H and Hj are diagonal in the present representation and H — Hj does not include 
(t|, so [H — Hj, aj] — 0. We therefore have 

i/^(r)|V(T)) = (a^-e'^^Oe-^^/'Eli'^*}) =0- (^0) 

Thus \ip{T)) is an eigenstate of Hq{T) with eigenvalue 0. In the present representation, 
the non- vanishing off-diagonal elements of —Hq{T) are all positive and the coefficients of 
\ip{T)) are also all positive as one sees in p3)) . Then \ip{T)) is the unique ground state 
of Hq (T) according to the Perron- Frobenius Theorem . □ 

A few remarks are in order. In the high-temperature limit, the quantum Hamiltonian 
is composed just of the transverse-field term, 

Hq{T^^)^-Y,K-l). (51) 

j 

Correspondingly the ground-state wave function \ijj{T — )■ 00)) is the simple summation 
over all possible states with equal weight. In this way the thermal fluctuations in the 
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original classical system are mapped to the quantum fluctuations. The low-temperature 
limit has, in contrast, the purely classical Hamiltonian 

H,{T^0)^xJ2^^"' (52) 

j 

and the ground state of Hq{T « 0) is also the ground state of H as is apparent from 
the definition (|43)) . Hence the decrease of thermal fluctuations in SA is mapped to the 
decrease of quantum fluctuations. As explained below, this correspondence allows us 
to analyze the condition for quasi-equilibrium in the classical SA using the adiabaticity 
condition for the quantum system. 



3.2 Adiabaticity and convergence condition of SA 

The adiabaticity condition applied to the quantum system introduced above leads to the 
condition of convergence of SA. Suppose that we monotonically decrease the temperature 
as a function of time, T{t), to realize SA. 

Theorem 3.2. The adiabaticity condition for the quantum system of Hq{T) yields the 
time dependence of T{t) as 

in the limit of large N . The coefficient a is exponentially small in N . 

A few Lemmas will be useful to prove this Theorem. 

Lemma 3.3. The energy gap A(r) of Hq{T) between the ground state and the first 
excited state is bounded below as 

A(T) > a\/]Ve-(''f+^)^, (54) 

where a and c are N -independent positive constants, in the asymptotic limit of large N . 

Proof. The analysis of Sec. 12.2.11 applies with the replacement of r{t) by x = e~^P 
and eo{t) — 0. This latter condition comes from Hq{T)\tjj{T)) — 0. The condition 
T{t) < To {t > to) is unnecessary here because the off-diagonal element x can always be 
chosen smaller than the diagonal elements by adding a positive constant to the diagonal. 
Equation (P7|) gives 

A,{t) > Ae-l^P^ (55) 



and A satisfies, according to ([28] 



A w fo%/2^e-'=^ (56) 

with b and c positive constants of 0{N'^). □ 

Lemma 3.4. The matrix element of the derivative of Hq{T), relevant to the adiabaticity 
condition, satisfies 

{ipi{T)\dTHq{T)\tl){T)) = '2k^ ' ^ 

where ipi (T) is the normalized first excited state of Hq (T) . 
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Proof. By differentiating the identity 



H,{TmT))=0 (58) 

we find 



IV^T)). (59) 



This relation immediately proves the Lemma if we notice that the ground state energy of 
Hq[T) is zero and therefore Hq{T)\^i{T)) = A(r)|?Ai(T)). □ 

Lemma 3.5. The matrix element of H satisfies 

\{MT)\H\i;{T))\<pN^/Z(T). (60) 

Proof. There are N terms in _ff = Hj , each of which is of norm of at most p. The 
factor ^JZiX) appears from normalization of |'i/;(T)). □ 

Proof of Theorem 13.21 The condition of adiabaticity for the quantum system Hq (T) 
reads 

1 



{MT)\dTil,{T)\^{T)) — 



dt 



(61) 



A(T)2yz(T) 

with sufficiently small 6. If we rewrite the matrix element by Lemma 13.41 . the left-hand 



side is 



\{MT)\HmT))\ 



2kBT^AiT)^/Z(T) 



dT 



dt 



(62) 



By replacing the numerator by its bound in Lemma 13.51 we have 



pN 



dT 



dt 



S < 1 (63) 



as a sufficient condition for adiabaticity. Using the bound of Lemma 13.31 and integrating 
the above differential equation for T{t) noticing dT/dt < 0, we reach the statement of 
Theorem ESI □ 



3.3 Remarks 

Equation (|53p reproduces the Geman-Geman condition for convergence of S A [33] . Their 
method of proof is to use the theory of classical inhomogeneous {i.e. time-dependent) 
Markov chain representing non-equilibrium processes. It may thus be naively expected 
that the classical system under consideration may not stay close to equilibrium during 
the process of SA since the temperature always changes. It therefore comes as a surprise 
that the adiabaticity condition, which is equivalent to the quasi-equilibrium condition 
according to Theorem 13.11 leads to Theorem 13.21 The rate of temperature change in this 
latter Theorem is slow enough to guarantee the quasi-equilibrium condition even when 
the temperature keeps changing. 

Also, Theorem 13.21 is quite general, covering the worst cases, as it applies to any 
system written as the Ising model of (|4ip. This fact means that one may apply a faster 
rate of temperature decrease to solve a given specific problem with small errors. The 
same comment applies to the QA situation in Sec. [2] 

Another remark is on the relation of QA and QAE. Mathematical analyses of QA 
often focus their attention to the generic convergence conditions in the infinite-time limit 
as seen in Sees. [5]and[5]as well as in the early paper [7], although the residual energy after 
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finite-time evolution has also been extensively investigated mainly in numerical studies. 
This aspect may have lead some researchers to think that QA is different from QAE, 
since the studies using the latter mostly concern the computational complexity of finite- 
time evolution for a given specific optimization problem using adiabaticity to construct 
an algorithm of QAE. As has been shown in the present and the previous sections, the 
adiabaticity condition also leads to the convergence condition in the infinite-time limit for 
QA and SA. In this sense QA, QAE and even SA share essentially the same mathematical 
background. 



4 Reduction of errors for finite-time evolution 

In Sec. [21 we discussed the convergence condition of QA implemented for the transverse- 
field Ising model. The power decrease of the transverse field guarantees the adiabatic evo- 
lution. This annealing schedule, however, does not provide practically useful algorithms 
because infinitely long time is necessary to reach the exact solution. An approximate 
algorithm for finite annealing time r should be used in practice. Since such a finite-time 
algorithm does not satisfy the generic convergence condition, the answer includes a cer- 
tain amount of errors. An important question is how the error depends on the annealing 
time r. 

Suzuki and Okada showed that the error after adiabatic evolution for time r is gener- 
ally proportional to r"^ in the limit of large r with the system size N kept finite ^16j. In 
this section, we analyze their results in detail and propose new annealing schedules which 
show smaller errors proportional to r~^™ (m > 1) IVf]. This method allows us to reduce 
errors by orders of magnitude without compromising the computational complexity apart 
from a possibly moderate numerical factor. 



4.1 Upper bound for excitation probability 

Let us consider the general time-dependent Hamiltonian ([T|). The goal of this section is 
to evaluate the excitation probability (closely related with the error probability) at the 
final time s = 1 under the adiabaticity condition (I16|) . 

This task is easy because we have already obtained the asymptotic form of the exci- 
tation amplitude (jlip . The upper bound for the excitation probability is derived as 



|0-(i)IVi(i)>| 



|c,^o(l)|'<^' 



\MO)\ + \A,il)\ +0(r-3). 



(64) 



This formula indicates that the coefficient of the term is determined only by the state 
of the system at s = and 1 and vanishes if Aj{s) is zero at s = and 1. 

When the r^^-term vanishes, a similar calculation yields the next order term of the 
excitation probability. If H'{0) = H'{1) = 0, the excitation amplitude Cj^o{^) is at most 
of order and then co(l) w 1 + 0{t^^). Therefore we have 



Cjyo(l) 



d.^^^. — m\ ^ io(.)) + o{t-^) 



A,(s) 



1 r 



A?^(0) 



(65) 



where we defined 



A,(s) 



m-{- 



, , ,,d™i?(s), , 



ds"^ 



(66) 



To derive the second line of ([55]) . we used integration by parts twice, and ([5|) and ©. The 
other terms vanish because of the assumption H'{0) = H'{1) = 0. Thus the upper 
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Figure 1: Examples of annealing schedules with reduced errors Usted in ([70]) - ([73]) . 



bound of the next order for the excitation probability under this assumption is obtained 



as 



(j(i)IV^(i)> 



< 



1 



Af\0) 



(67) 



It is easy to see that the r "^-term also vanishes when H"{0) = H"{1) = 0. It is straight- 
forward to generalize these results to prove the following Theorem. 

Theorem 4.1. // the kth derivative of H{s) is equal to zero at s = and 1 for all 
fc = l,2,---,m— 1, the excitation probability has the upper bound 



(J(l)l^(l)) 



< 



1 



4™^(i) 



0(7 



(68) 



4.2 Annealing schedules with reduced errors 

Although we have so far considered the general time-dependent Hamiltonian, the ordinary 
Hamiltonian for QA with finite annealing time is composed of the potential term and the 
kinetic energy term, 

H{s) - /(s)ffpot + [1 - f{s)] i/kin., (69) 

where i?pt and i?kin generalize iJising and i?TF in Sec. [21 respectively. The function 
/(s), representing the anneahng schedule, satisfies /(O) — and /(I) — 1. Thus H{Q) — 
Hkin and H{1) — iJpot- The ground state of i?pot corresponds to the solution of the 
optimization problem. The kinetic energy is chosen so that its ground state is trivial. The 
above Hamiltonian connects the trivial initial state and the non-trivial desired solution 
after evolution time r. 

The condition for the T~^™-term to exist in the error is obtained straightforwardly 
from the results of the previous section because the Hamiltonian (j69p depends on time 
only through the annealing schedule f{s). It is sufficient that the A:th derivative of /(s) 
is zero at s = and 1 for A; = 1, 2, • • • , m — 1. We note that f{s) should belong to C™, 
that is, f{s) is an mth differentiable function whose mth derivative is continuous. 
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Examples of the annealing schedules fm{s) with the r ^"^ error rate are the following 
polynomials: 

/i(5)=5, (70) 

/2(s) = 52(3_2s), (71) 

/3(s) = 5^(10 -15s + 6s2), (72) 

/4(s) = s''(35- 84s + 70s2-20s3). (73) 

The linear annealing schedule /i(s), which shows the error, has been used in the past 
studies. Although we here list only polynomials symmetrical with respect to the point 
s — 1/2, this is not essential. For example, f{s) = (1 — cos(7rs^))/2 also has the t"** error 
rate because /'(O) = /'(I) = /"(O) = but /"(I) = -2tt^. 



4.3 Numerical results 



4.3.1 Two-level system 

To confirm the upper bound for the excitation probability discussed above, it is instructive 
to study the two-level system, the Landau-Zener problem, with the Hamiltonian 



2 •' It 



(74) 



The energy gap of -ffLz(i) has the minimum 2a at /(s) = 1/2. If the annealing time r 
is not large enough to satisfy (|16p . non-adiabatic transitions occur. The Landau-Zener 



theorem [38l[39] provides the excitation probability Pcx(t) 

r 2 



(i(i)IV'(i)) 



as 



exp 



(75) 



where s* denotes the solution of /(s*) = 1/2. On the other hand, if r is sufficiently large, 
the system evolves adiabatically. Then the excitation probability has the upper bound 
(l68l). which is estimated as 



PUt) 



< 



j-2m 



(/l2 +4q2) 



m+2 



ds" 



(0) 



ds'^ 



(1) 



(76) 



We numerically solved the Schrodinger equation ([2]) for this system ([74| with the 
Runge-Kutta method [41] . Figure [2] shows the result for the excitation probability with 
anneahng schedules ([70)1 - (175)) . The initial state is the ground state of -ffLz(O). The 
parameters are chosen to be /i = 2 and a = 0.2. The curved and straight lines show 
(|75p and ([7S)) . respectively. In the small and large r regions, the excitation probability 
perfectly fits to those two expressions. 



4.3.2 Spin glass model 

We next carried out simulations of a rather large system, the Ising spin system with 
random interactions. The quantum fluctuations are introduced by the uniform transverse 
field. Thus, the potential and kinetic energy terms are defined by 

N 

N 

i/kin = -r^<. (78) 

1=1 
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Figure 2: The annealing-time dependence of the excitation probabihty for the two- level 
system (f74|l using schedules (|70l) to (|73]l . The curved and straight lines show (f75|) and (f76l) 
for each annealing schedule, respectively. The parameters in (j71|) are chosen to be /i = 2 and 
a = 0.2. 



The initial state, namely the ground state of 77kin, is the all-up state along the x axis. 

The difference between the obtained approximate energy and the true ground state 
energy (exact solution) is the residual energy i?ros- ft is a useful measure of the error rate 
of QA. It has the same behavior as the excitation probability because it is rewritten as 

£;„s = (V^(l)|i?pot|^(l)>-£o(l) (79) 
= ^A,(l)|(j(l)|^(l))f . (80) 

J>0 

Therefore E^cs is expected to be asymptotically proportional to r^^™ using the improved 
annealing schedules. 

We investigated the two-dimensional square lattice of size 3x3. The quenched random 
coupling constants {Jij} are chosen from the uniform distribution between —1 and -1-1, as 
shown in Fig. [3l The parameters are h = 0.1 and F = 1. Figure |4] shows the r dependence 
of the residual energy using the annealing schedules ([70]) - ([75)) . Straight lines representing 
^-2m _ 1,2,3,4) are also shown for comparison. The data clearly indicates the 
T^^'"-law for large r. The irregular behavior around -Bros ~ 10~^^ comes from numerical 
rounding errors. 



4.3.3 Database search problem 

As another example, we apply the improved annealing schedule to the database search 
problem of an item in an unsorted database. Consider N items, among which one is 
marked. The goal of this problem is to find the marked item in a minimum time. The 
pioneering quantum algorithm proposed by Grover [42j solves this task in time of order 
vW, whereas the classical algorithm tests items on average. Farhi et al. [26] proposed 
a QAE algorithm and Roland and Cerf [43] found a QAE-based algorithm with the same 
computational complexity as Grover's algorithm. Although their schedule is optimal in 
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Figure 3: Configuration of random interactions {Jij} on tlie 3x3 square lattice which we 
investigated, and spin configuration of the target state. The sohd and dashed hues indicate 
ferromagnetic and antiferromagnetic interactions, respectively. 




100 1000 10000 100000 
T 

Figure 4: The anneahng-time dependence of the residual energy for the two-dimensional spin 
glass model with improved annealing schedules. The solid lines denote functions proportional 
to T~^™ (m = 1, 2, 3, 4). The parameter values are h = 0.1 and F = 1. 



18 



the sense that the excitation probability by the adiabatic transition is equal to a small 
constant at each time, it has the error rate. We show that annealing schedules with 
the r~^™ error rate can be constructed by a slight modification of their optimal schedule. 

Let us consider the Hilbert space which has the basis states |i) (i = 1, 2, • • • , N), and 
the marked state is denoted by |m). Suppose that we can construct the Hamiltonian (|69p 
with two terms, 

i?pot = 1- |m)(m|, (81) 

N N 

^kin = i-^EEi*)oi- (82) 

i=l J=l 

The Hamiltonian iJpot can be applied without the explicit knowledge of |m), the same 
assumption as in Grover's algorithm. The initial state is a superposition of all basis 
states, 

which does not depend on the marked state. The energy gap between the ground state 
and the first excited state, 



Ai(s) = ^l-4^/(.)[l -/(.)], (84) 

has a minimum at /(s) = 1/2. The highest eigenvalue £2(5) = 1 is (A^— 2)-fold degenerate. 

To derive the optimal annealing schedule, we briefly review the results reported by 
Roland and Cerf [43 . When the energy gap is small {i.e. for /(s) « 1/2), non-adiabatic 
transitions are likely to occur. Thus we need to change the Hamiltonian carefully. On 
the other hand, when the energy gap is not very small, too slow a change wastes time. 
Thus the speed of parameter change should be adjusted adaptively to the instantaneous 
energy gap. This is realized by tuning the annealing schedule to satisfy the adiabaticity 
condition p6p in each infinitesimal time interval, that is, 

tiia.* (85) 



where 5 is a small constant. In the database search problem, this condition is rewritten 
as 

^/N~T d/ , 



rA^Ai(s)3 ds 

After integration under boundary conditions /(O) = and /(I) = 1, we obtain 



(86) 



As plotted by a solid line in Fig. [5l this function changes most slowly when the energy 
gap takes the minimum value. It is noted that the annealing time is determined by the 
small constant 5 as 

r = ^. (88, 



which means that the computation time is of order v N similarly to Grover's algorithm. 

The optimal annealing schedule ((87|) shows the r^^ error rate because its derivative 
is non- vanishing at s = and 1. It is easy to see from (|87l) that the simple replacement 
of s with fm{s) fulfils the condition for the r^^™ error rate. We carried out numerical 
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Figure 5: The optimal annealing schedules for the database search problem {N = 64). The 
solid line denotes the original optimal schedule (j87p and the dashed lines are for the modified 
schedules. 




[ . . d 
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r 

Figure 6: The annealing-time dependence of the residual energy for the database search 
problem (iV = 64) with the optimal annealing schedules described in Fig. [5l The solid lines 
represent functions proportional to r"^™ (m = 1,2,3,4). 
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simulations for iV = 64 with such anneahng schedules, /o[rt (s) = /opt if mis)), as plotted 

by dashed lines in Fig. [S] As shown in Fig. [HI the residual energy with fjy^tis) is propor- 
tional to r"^"*. The characteristic time for the r"^"* error rate to show up increases 
with m: Since the modified optimal schedule fjy^tis) has a steeper slope at s = 1/2 than 
/opt(s)) a longer annealing time is necessary to satisfy the adiabaticity condition (|86p . 
Nevertheless, the difference in slopes of fo^t is) is only a factor of 0(1), and therefore Tc 
is still scaled as ^/N. Significant qualitative reduction of errors has been achieved without 
compromising computational complexity apart from a numerical factor. 



4.4 Imaginary-time Schrodinger Dynamics 

So far, we have concentrated on QA following the real-time (RT) Schrodinger dynamics. 
From the point of view of physics, it is natural that the time evolution of a quantum 
system obeys the real-time Schrodinger equation. Since our goal is to find the solution 
of optimization problems, however, we need not stick to physical reality. We therefore 
investigate QA following the imaginary-time (IT) Schrodinger dynamics here to further 
reduce errors. 

The IT evolution tends to filter out the excited states. Thus, it is expected that QA 
with the IT dynamics can find the optimal solution more efficiently than RT-QA. Stella 
et al. [20] have investigated numerically the performance of IT-QA and conjectured that 
(i) the IT error rate is not larger than in the RT, and that (ii) the asymptotic behavior of 
the error rate for r — > cx) is identical for IT-QA and RT-QA. We prove their conjectures 
through the IT version of the adiabatic theorem. 



4.4.1 Imaginary-time Schrodinger equation 

The IT Schrodinger equation is obtained by the transformation t — > —it in the time 
derivative of the original RT Schrodinger equation: 

^^^\^,{t))=Hit)\^{t)). (89) 

The time dependence of the Hamiltonian does not change. If the Hamiltonian is time- 
independent, we easily see that the excitation amplitude decreases exponentially relative 
to the ground state, 

\^it)) ^Y.^,e-'''' ^ E^^-^"*'^ 1^') = e-*^" 5]c,e-*(--^°) \j) . (90) 

3 3 3 

However, it is not obvious that this feature survives in the time-dependent situation. 

An important aspect of the IT Schrodinger equation is non-unitarity. The norm of 
the wave function is not conserved. Thus, we consider the normalized state vector 

\^[t))^^=^==\^it)). (91) 

The equation of motion for this normalized state vector is 

-^^\^(t))^[Hit)-{Hit))]\m). (92) 
where we defined the expectation value of the Hamiltonian 

{H{t))^{m\Hit)m))- (93) 
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The above equation is not linear but norm-conserving, which makes the asymptotic expan- 
sion easy. In terms of the dimensionless time s = t/r, the norm-conserving IT Schrodinger 
equation is written as 



_d 

ds 



H{s)^{H{s)) i^is) 



(94) 



4.4.2 Asymptotic expansion of the excitation probability 

To prove the conjecture by SteUa et ai, we derive the asymptotic expansion of the exci- 
tation probabihty. The following Theorem provides us with the imaginary-time version 
of the adiabatic theorem. 

Theorem 4.2. Under the same hypothesis as in Theorem \2.1l the state vector following 
the norm-conserving IT Schrodinger equation ^94\ ) has the asymptotic form in the limit 
of large r as 

i^{s))=Y,c,{s)\j{s)), (95) 



co(s)~1 + 0(t-2), 



(96) 
(97) 



Proof. The norm-conserving IT Schrodinger equation ()94|) is rewritten as the equation of 
motion for Cj{s) as 



ds ^ 



ej(s) -efc(s) 



ds 



To remove the second term on the right-hand side, we define 



Cj{s) = exp i''' J ^s 



cj(s), 



(98) 



(99) 



and obtain the equation of motion for Cj{s) as 



d5, ^ ^ dg(s) 



ej{s) - Skis) 



ds 



(100) 



where we defined (t>j{s) = ds'ej{s') for convenience. 

Integration of this equation yields the integral equation for Cj(s). It is useful to 
introduce the following quantity. 



Sis)^ rd~sJ2h{S)^eoiS)]\c^iSW 
•^0 ;#o 



Since the norm of the wave function is conserved, |q(s)P — 1 and therefore 
J2ei{s)Msr\ = eois) + ^ h{s) - e^{s)\ \c,{^sf\. 

Thus, the definition of Cj(s) is written as 



(101) 



(102) 



5j-(s) = e-^'^(")e^['^^('')-'^''(^)lcj-(s). 



(103) 
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Finally we obtain the integral equation for Cj(s): 



co(s) = e-^»+e-^(^) r dge-^(-') y Q(g) 1^^)) ^ ^^^^^ 

Jo eo(s)-£/(s) 

c^.^o(s) = e^^(^)e-^[*^(^)-*«(^)l rdse-^^(^^e^[^^(^~)-'^°(^~)l y Cfe(S)^^^i^i^^^, 

(105) 

where we used the initial condition co(0) — 1 and c^^o = 0. 

The next step is the asymptotic expansion of these integral equations for large r. 
It is expected that Co(s) = 1 and Cj^o(s) — for t —^ oo because of the following 
argument: Since the coefficient Co(s) is less than unity, 6{s) should be 0{t~^) at most 
and e'^''^*'^ = 0(1). The second factor on the right-hand side of pOSp is small exponentially 
with r because 4>j (s) — ipQ (s) is positive and an increasing function of s. Thus, cj^o (s) 
and then co(s) — > 1 owing to the norm conservation law. 

Therefore we estimate the next term of order r^^ under the assumption that co(s) 3> 
Cj^o{s). Since S{s) is proportional to the square of Cj^o{s), we have e'^''^*^ « 1. Thus, 
the e^'^'^^*-' factors can be ignored in the r"^ term estimation of (jlOSp . Consequently, 
evaluation of the integral equations yields 

CjM^) « e-^I'^^t^)-^"^^)! dse-[^^(^~)-*«(^~)l^^^^^^|^ + O (r-^) . (106) 
The excitation amplitude is estimated by integration by parts as 

CjMs) ~ I [a^s) - e-"('^^(")-*«(^»A,(0)] + O (r-2) , (107) 

where Aj{s) is defined by ([H]). The second term in the square brackets is vanishingly 
small, which is a different point from the RT dynamics. From the above expression, we 
find (5(s) = 0{t-^), that is e^-'t^' w 1 + 0{t-^). Therefore, we obtain ^ and ([W]). 
which is consistent with the assumption co(s) 3> Cj^Q{s). □ 



Remark. The excitation probability at the end of a QA process is proportional to t ^ in 
the large t limit: 

|(j(l)|^(l))|'«^|A,(l)|V0(r-3). (108) 



Its difference from the upper bound for the RT dynamics (j64p is only in the absence 
of Aj{0). In the IT dynamics, this term decreases exponentially because of the factor 
g-r(0j(s)-0o(s))^ 'pjjjg result proves the conjecture proposed by Stella et al. 20], that is, 

eiT(r) < eRT(T), (109) 
eiT(T) « eRT(T) (r ^ c»). (110) 

Strictly speaking, the right-hand sides in the above equations denote the upper bound for 
the error rate for RT-QA, not the error rate itself. In some systems, for example, the two 
level system, the error rate oscillates because Aj{{)) and Aj{l) may cancel in (fTT|) . and 
becomes smaller than that of IT-QA at some r. However, QA for ordinary optimization 
problems has different energy levels at initial and final times, and thus such a cancellation 
seldom occurs. 
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Figure 7: The annealing schedules defined in (jllip . /sqi(s) and /sq2('S) have a vanishing slope 
at the initial time s = and the final time s = 1, respectively. 



4.4.3 Numerical verification 

We demonstrate a numerical verification of the above results by simulations of the IT- 
and RT-Schrodinger equations. For this purpose, we consider the following annealing 
schedules (Fig. [7]): 

/sql(s)=s2^ /sq2(s) =S(2-S). (ill) 

The former has a zero slope at the initial time s — and the latter at s = 1. Thus, the 
Aj{0) and Aj{l) terms vanish with /sqi(s) and /sq2(s), respectively. Since the error rate 
for IT-QA depends only on Aj{l), IT-QA with /sq2(s) should show the error rate, 
while RT-QA with /sq2(s) exhibits the r'^-law. On the other hand, RT-QA and IT-QA 
with /sqi(s) should have the same error rate for large r. Figure [8] shows the residual 
energy with two annealing schedules for the spin-glass model presented in Sec. 14.3.21 
which explicitly supports our results. 

5 Convergence condition of QA — Quantum Monte 
Carlo evolution 

So far, we have discussed QA with the Schrodinger dynamics. When we solve the 
Schrodinger equation on the classical computer, the computation time and memory in- 
crease exponentially with the system size. Therefore, some approximations are necessary 
to simulate QA processes for large-size problems. In most numerical studies, stochastic 
methods are used. In this section, we investigate two types of quantum Monte Carlo 
methods and prove their convergence theorems, following |40j . 

5.1 Inhomogeneous Markov chain 

Since we prove the convergence of stochastic processes, it is useful to recall various def- 
initions and theorems for inhomogeneous Markov processes [5]. We denote the space 
of discrete states by S and assume that the size of S is finite. A Monte Carlo step is 
characterized by the transition probability from state x{€ S) to state ?/(g S) at time step 
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Figure 8: The annealing-time dependence of the residual energy for IT- and RT-QA with 
annealing schedules /sqi(s) and /sq2(s). The system is the spin-glass model presented in 
Sec. 14.3.21 The solid lines stand for functions proportional to and t~^. The parameters 
are h = 0.1 and F = 1. 



t: 

G{y x-t) = l^^^'^^^^^'^'*^ (a^T^j;) ^-^2) 

where P{y, x) and A{y, x; t) are called the generation probability and the acceptance prob- 
ability, respectively. The former is the probability to generate the next candidate state y 
from the present state x. We assume that this probability does not depend on time and 
satisfies the following conditions: 

yx,yeS:Piy,x)=P{x,y)>0, (113) 
\/xeS:P{x,x) = 0, (114) 

Va; e 5 : ^P(?/,a;) = 1, (115) 
yes 



yx,y e S,3n > 0,3zi, ■■■ , z„_i e 5 : J]^ P{zk+i,Zk) > 0, zq = x, z„ = y. (116) 

fc=0 

The last condition represents irreducibility of S, that is, any state in S can be reached 
from any other state in S. 

We define Sx as the neighborhood of x, i.e., the set of states that can be reached by 
a single step from x: 

Sx = {y\yeS,Piy,x)>0}. (117) 

The acceptance probability A{y, x; t) is the probability to accept the candidate y gener- 
ated from state x. The matrix G{t), whose {y, x) component is given by pi2p . [G{t)]y^x = 
G{y,x;t), is called the transition matrix. 
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Let V denote the set of probability distributions on S. We regard a probability dis- 
tribution p V) as the column vector with the component [p\x = p{x). The probabihty 
distribution at time t, started from an initial distribution po 'P) at time io, is written 
as 

p{t, to) = G*-*>o = Git - \)G{t - 2) • • • G(io)po. (118) 
A Markov chain is caUed inhomogeneous when the transition probabihty depends 
on time. In the following sections, we will prove that inhomogeneous Markov chains 
associated with QA are ergodic under appropriate conditions. There are two kinds of 
ergodicity, weak and strong. Weak ergodicity means that the probability distribution 
becomes independent of the initial conditions after a sufhciently long time: 

yto>0: limsup{||p(t,<o)-p'(i,<o)|||po,Poe^} = 0, (119) 

where p{t,to) and p'{t,to) are the probability distributions with different initial distribu- 
tions pq and p'q . The norm is defined by 

\\p\\ = Y.\Pi^)\- (120) 

Strong ergodicity is the property that the probability distribution converges to a unique 
distribution irrespective of the initial state: 

3r eVyto>0 : lim sup{\\p{t, to) - r\\\po e V} = 0. (121) 

t — >oo 

The following two Theorems provide conditions for weak and strong ergodicity of an 
inhomogeneous Markov chain 5 . For proofs see Appendix B. 

Theorem 5.1 (Condition for weak ergodicity). An inhomogeneous Markov chain is 
weakly ergodic if and only if there exists a strictly increasing sequence of positive numbers 
{ti}, {i — 0,1,2, . . . ), such that 



5^[l-a(G*^+-*')] — oo, (122) 

i=0 

where Q!(G*'+1'*') is the coefficient of ergodicity defined by 

^((^t.+i,*.-) = l-niin|^min{G(0,a;),G(z,y)} a;,?;e5| (123) 

with the notation G{z,x) — [G*'+^'*']2_j,. 

The coefficient of ergodicity measures the variety of the transition probability. If 
G{z, x) is independent of a state x, a{G) is equal to zero. 

Theorem 5.2 (Condition for strong ergodicity). An inhomogeneous Markov chain is 
strongly ergodic if the following three conditions hold: 

1. the Markov chain is weakly ergodic, 

2. for all t there exists a stationary state pt € V such that pt = G{t)pt, 

3. Pt satisfies 

oc 

<oo. (124) 
Moreover, if p — lim pt, then p is equal to the probability distribution r in il21\) . 



t — >oc 

We note that the existence of the limit is guaranteed by (|124p . This inequality implies 
that the probability distribution pt{x) is a Cauchy sequence: 

Ve > 0,3to > 0,yt,t' > ta : \pt{x) ^ pf {x)\ < e. (125) 
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5.2 Path-integral Monte Carlo method 

Let us first discuss convergence conditions for the implementation of quantum annealing 
by the path- integral Monte Carlo (PIMC) method [H [55]. The basic idea of PIMC 
is to apply the Monte Carlo method to the classical system obtained from the original 
quantum system by the path-integral formula. We first consider the example of ground 
state search of the Ising spin system whose quantum fluctuations are introduced by adding 
a transverse field. The total Hamiltonian is defined in (pO| . Although we only treat 
the two-body interaction for simplicity in this section, the existence of arbitrary many- 
body interactions between the z components of Pauli matrix and longitudinal random 
magnetic field ^/licrf, in addition to the above Hamiltonian, would not change the 
following argument. 

In the path-integral method, the d-dimensional transverse-field Ising model (TFIM) 
is mapped to a (d + l)-dimensional classical Ising system so that the quantum system 
can be simulated on the classical computer. In numerical simulations, the Suzuki- Trotter 
formula [23l [24] is usually employed to express the partition function of the resulting 
classical system, 

(„ M M N \ 

jjJ:T. J^^-^-T + E E ^-f ' (126) 
fc=l k=l i=0 / 

where M is the length along the extra dimension (Trotter number) and cr|'^''(= ±1) 
denotes a classical Ising spin at site i on the fcth Trotter slice. The nearest-neighbour 
interaction between adjacent Trotter slices, 

7(0 = ^log(coth^), (127) 

is ferromagnetic. This approximation (I126P becomes exact in the limit M ^ oo for a fixed 
/3 = l/fcsT. The magnitude of this interaction (|127p increases with time t and tends to 
infinity as t ^ oo, refiecting the decrease of T{t). We fix M and /? to arbitrary large 
values, which corresponds to the actual situation in numerical simulations. Therefore the 
Theorem presented below does not directly guarantee the convergence of the system to 
the true ground state, which is realized only after taking the limits M ^ oo and /3 — > oo. 
We will rather show that the system converges to the thermal equilibrium represented 
by the right-hand side of l|126p , which can be chosen arbitrarily close to the true ground 
state by taking M and (5 large enough. 

With the above example of TFIM in mind, it will be convenient to treat a more general 
expression than (|126p . 

Here Ft){x) is the cost function whose global minimum is the desired solution of the 
combinatorial optimization problem. The temperature Tq is chosen to be sufficiently 
small. The term Fi(x) derives from the kinetic energy, which is the transverse field in 
the TFIM. Quantum fiuctuations are tuned by the extra temperature factor Ti(t), which 
decreases with time. The first term -~Fo{x)/To corresponds to the interaction term in 
the exponent of (|126p . and the second term —Fi{x) /Ti{t) generalizes the transverse-field 
term in (|126p . 



(128) 
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For the partition function (I128p , we define the acceptance probability of PIMC as 

A{y,x;t)=g(^), (129) 



\q{x;t) 

g(.;t) = ^exp(^-^-^j. (130) 

This q{x; t) is the equilibrium Boltzmann factor at a given fixed Ti{t). The function g{u) 
is the acceptance function, a monotone increasing function satisfying < g{u) < 1 and 
g{l/u) = g{u)/u for u > 0. For instance, for the heat bath and the Metropolis methods, 
we have 

9iu) - (131) 
g{u) = mm{l,u}, (132) 

respectively. The conditions mentioned above for g{u) guarantee that q{x;t) satisfies 
the detailed balance condition, G{y, x;t)q{x;t) — G{x,y;t)q(y]t). Thus, q{x;t) is the 
stationary distribution of the homogeneous Markov chain defined by the transition matrix 
G{t) with a fixed t. In other words, q{x; t) is the right eigenvector of G{t) with eigenvalue 
1. 

5.2.1 Convergence theorem for PIMC-QA 

We first define a few quantities. The set of local maximum states of Fi is written as Sm, 

S,n^{x\xe S, Vy e S^, Fi(y) < Fi{x)} . (133) 

We denote by d{y, x) the minimum number of steps necessary to make a transition from 
X to y. Using this notation we define the minimum number of maximum steps needed to 
reach any other state from an arbitrary state in the set S \ Sm , 

R = min|max{c?(y, a;) | y e 5} | x e 5 \ 5„i|. (134) 

Also, Lq and Li stand for the maximum changes of Fq(x) and Fi{x), respectively, in a 
single step, 

Lo = max{|Fo(x) -Fo(y)| | P(y,a;) >0, x,yesY (135) 

ii =max{|Fi(x) -Fi(y)| \ P{y,x) > 0, x.^yes}. (136) 

Our main results are summarized in the following Theorem and Corollary. 

Theorem 5.3 (Strong ergodicity of the system (|128p ). The inhomogeneous Markov chain 
generated by 1129\) and Iil30\) is strongly ergodic and converges to the equilibrium state 
corresponding to the first term of the right-hand side of U3U\) . exp{^ Fo(x) /Tq) , if 



Application of this Theorem to the PIMC implementation of QA represented by (|126p 
immediately yields the following Corollary. 
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Corollary 5.4 (Strong ergodicity of QA-PIMC for TFIM). The inhomogeneous Markov 
chain generated by the Boltzmann factor on the right-hand side of hl26\) is strongly ergodic 
and converges to the equilibrium state corresponding to the first term on the right-hand 
side of mkl if 

rW>f tanh-(^^^. (138) 



Remark. For sufficiently large t, the above inequality reduces to 

r(<) > ^(t + 2)-2/fl^^ (139) 

This result implies that a power decay of the transverse field is sufficient to guarantee 
the convergence of quantum annealing of TFIM by the PIMC. Notice that R is of 0{N^) 
and Li is of 0{N). Thus ([1^5)) is qualitatively similar to (pTjl . 

To prove strong ergodicity it is necessary to prove weak ergodicity first. The following 
Lemma is useful for this purpose. 

Lemma 5.5 (Lower bound on the transition probability). The elements of the transition 
matrix defined by 1129\) and IISC^) have the following lower bound: 

P{y, x) > ^ > : G{y, x;t)>w g{l) exp ( - J^) , (140) 



and 



To T,{t) 



3ti > 0,Va: e 5 \ 5,„,Vt > ti : G{x,x; t) > w g{l) exp ( - ) . (141) 



To Tiit) 

Here, w stands for the minimum non- vanishing value of P{y, x), 

w = min {P{y, x) \ P{y, a;) > 0, x,y S} . (142) 



Proof of Lemma 15. 5i The first part of Lemma 15.51 is proved straightforwardly. Equa- 
tion p40p follows directly from the definition of the transition probability and the prop- 
erty of the acceptance function g. When q(jj;t)/q{x;t) < 1, we have 

Giy,x;t) > ^g f^^) > .,(l)exp (-^ - . (143) 

\i{y;t)J i{x;t) V ^0 Ti{t)j 

On the other hand, if q{y; t)/q{x; t) > 1, 

G{y,x;t) > wg{l) > u; 5(1) exp (^-^ - , (144) 

where we used the fact that both Lq and Li are positive. 

Next, we prove (I14ip . Since x is not a member of Sm, there exists a state y € Sx such 
that Fi{y) — Fi{x) > 0. For such a state y, 

<-) 

because Ti{t) tends to zero as i ^ cxd and < g{u) < u. Thus, for all e > 0, there exists 
ti > such that 

yt>t,:g[e.p^ j j < e. (146) 
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We therefore have 

^P{z,x)A{z,x;t) ^ P{y,x)A{y,x;t) + ^ P{z,x)A{z,x;t) 
zes zes\{v} 

<Piy,x)e+ ^(^'^) 

zeS\{y} 

= l-{l-e)P{y,x), (147) 

and consequently, 

G{x,x;t)> {l-e)P{y,x)>0. (148) 

Since the right-hand side of (|14ip can be arbitrarily small for sufficiently large t, we obtain 
the second part of Lemma 15.51 □ 



Proof of weak ergodicity implied in Theorem 15.31 Let us introduce the following 
quantity 

X* = argmin|max{c?(?;, a;) | G 5} | x G 5 \ iSm|- (149) 

Comparison with the definition of R in (|134p shows that the state x* is reachable by at 
most R transitions from any states. 

Now, consider the transition probability from an arbitrary state x to x*. From the 
definitions of R and a;*, there exists at least one transition route within R steps: 

X = Xo ^ Xi ^ X2 ^ ■ ■ ■ ^ XI ^ Xl+i = ■ ■ ■ = XR = X* . 

Then Lemma 15.51 yields that, for sufficiently large t, the transition probability at each 
time step has the following lower bound: 

G{x^+ux^;t^ R + i) > W5(l)exp - t^tjj^^^^^^^ ■ (150) 

Thus, by taking the product of (|150p from i = Otoi = _R— 1, we have 

G*^*-^(x*,a;) > G{x* ,XR-i;t - l)GixR-i,XB-2;t - 2) ■ ■ - Gixux-^t ~ R) 
R-i , J J 



> Y[ wg{l)exp ( --^ 



1=0 



To Ti{t-R + i] 



where we have used monotonicity of Ti(t). Consequently, it is possible to find an integer 
fco > such that, for all k > ko, the coefficient of ergodicity satisfies 

where we eliminate the sum over z in (jl23p by replacing it with a single term for z = x* . 
We now substitute the annealing schedule (|137p . Then weak ergodicity is immediately 
proved from Theorem 1 5 . 1 1 because we obtain 



£(1 - a{G'^''"~^)) > w^g{lf e.p £ ^ ^ oo. (153) 
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Proof of Theorem 15.31 To prove strong ergodicity, we refer to Theorem l5.2l The con- 
dition 1 has already been proved. As has been mentioned, the Boltzmann factor (jl30p 
satisfies q{t) = G{t)q{t), which is the condition 2. Thus the proof wiU be complete if we 
prove the condition 3 by setting pt — q{t). For this purpose, we first prove that q{x; t) is 
monotonic for large t: 



yt > 0, V.T e : q{x; t + I) > q{x- 1), 

3ti > 0,Vt > ti,Vx e : q{x;t + l) < q{x;t), 

where 5™'" denotes the set of global minimum states of Fi . 

To prove this monotonicity, we use the following notations for simplicity: 



A{x) — exp — 



Foix) 
To 



, B= J2 ^(^)' 



A{x) ^ Fi{x) ^ F^'''. 
If a; S the Boltzmann distribution can be rewritten as 

A{x) 



q{x;t) 



B 



Yl ( - 



A(y) 
Ti{t) 



(154) 
(155) 



(156) 
(157) 

(158) 



A{y) 



Since A(2/) > by definition, the denominator decreases with time. Thus, wc obtain 
(fT54|) . 

To prove ()155p . we consider the derivative of q{x;t) with respect to Ti{t), 

A(y)\ 



A{x 



dq{x;t) 
dTi{t) 



BAix) + J2 (^i(^) - Fiiy))exp - 



Ti{t)J 



My) 



T{tf exp 



A(^ 

Ti{t) 



B 



A(y) 
Ti{t) 



(159) 



Only Fi{x) — Fi{y) in the numerator has the possibility of being negative. However, the 
first term BA{x) is larger than the second one for sufficient large t because exp (— A(j/)/Ti(t)) 
tends to zero as Ti(t) oo. Thus there exists ti > such that dq{x; t)/dT{t) > for all 
t > ti. Since Ti{t) is a decreasing function of t, we have (|155p . 
Consequently, for all t > ti, we have 



\\q{t + 1) ^ q{t)\\ ^ [q{x;t + l)-q{x;t)]- ^ [q{x;t + 1) - q{x;t)] 

= 2 Y [q{x;t + l)-q{x;t)], 

where we used \\q{t)\\ = X^^es;'"" + J^x^s^"'" ^) ~ ^- ti^S'a obtain 



(160) 



^|lg(t + l)-q(t)|l =2 ^ [q(x;oo)-g(a;;ti)] <2|lg(a;;oo)|l =2. (161) 
t=ti xeS'^"" 
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Therefore q{t) satisfies the condition 3: 



£ Mt + 1) - q{t)\\ = J2 hit + 1) - qm + £ Mt + 1) - qim 

t=0 i=0 t=ti 

tl-l 



< E [lk(i + i)ll + lkWII] + 2 

= 2ti + 2<oo, (162) 
which completes the proof of strong ergodicity. □ 

5.2.2 Generalized transition probability 



In Theorem 15.31 the acceptance probability is defined by the conventional Boltzmann 
form, (fT29)) and (fT30)) . However, we have the freedom to choose any transition (accep- 
tance) probability as long as it is useful to achieve our objective since our goal is not 
to find finite-temperature equilibrium states but to identify the optimal state. There 
have been attempts to accelerate the annealing schedule in SA by modifying the transi- 
tion probability. In particular Nishimori and Inoue [31] have proved weak ergodicity of 
the inhomogeneous Markov chain for classical simulated annealing using the probability 
of Tsallis and Stariolo [44]. There the property of weak ergodicity was shown to hold 
under the annealing schedule of temperature inversely proportional to a power of time 
steps. This annealing rate is much faster than the log-inverse law for the conventional 
Boltzmann factor. 

A similar generalization is possible for QA-PIMC by using the following modified 
acceptance probability 

My,x;t) = g{u{y,x;t)) , (163) 
uiy, x; t) = e-[^ofe)-^o(.)l/To + _ i)^iM_iiM] '^'"'^ , (i64) 

where g is a real number. In the limit q ^ I, this acceptance probability reduces to the 
Boltzmann form. Similarly to the discussions leading to Theorem I5.3[ we can prove that 
the inhomogeneous Markov chain with this acceptance probability is weakly ergodic if 

T.(t)>^, 0<c<i^, (165) 

where 5 is a positive constant. We have to restrict ourselves to the case q > I for a tech- 
nical reason as was the case previously [34^ . We do not reproduce the proof here because 
it is quite straightforward to generalize the discussions for Theorem 15.31 in combination 
with the argument of [34j. The result (|165p applied to the TFIM is that, if the annealing 
schedule asymptotically satisfies 

r,o>|„p(-^), («6) 

the inhomogeneous Markov chain is weakly ergodic. Notice that this annealing schedule 
is faster than the power law of (I139p . We have been unable to prove strong ergodicity 
because we could not identify the stationary distribution for a fixed Ti(t) in the present 
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5.2.3 Continuous systems 

In the above analyses we treated systems with discrete degrees of freedom. Theorem 15.31 
does not apply directly to a continuous system. Nevertheless, by discretization of the 
continuous space we obtain the following result. 

Let us consider a system of N distinguishable particles in a continuous space of finite 
volume with the Hamiltonian 

1 ^ 

^-^E^^ + nW). (167) 

The mass m{t) controls the magnitude of quantum fluctuations. The goal is to find the 
minimum of the potential term, which is achieved by a gradual increase of m{t) to infinity 
according to the prescription of QA. After discretization of the continuous space (which 
is necessary anyway in any computer simulations with finite precision) and an application 
of the Suzuki- Trotter formula, the equilibrium partition function acquires the following 
expression in the representation to diagonalize spatial coordinates 



m - Tr exp L^tv ({.f '}) - ^ £ E 1^^'' " 1^ 



2(3 

k=l ^ i=l fc=l 



with the unit h — 1. Theorem 15.31 is applicable to this system under the identification 
of Ti{t) with m{t)^^ . We therefore conclude that a logarithmic increase of the mass 
suffices to guarantee strong ergodicity of the potential-minimization problem under spatial 
discretization. 

The coefficient corresponding to the numerator of the right-hand side of (|137p is 
estimated as 

RLi w M^NL^/P, (169) 



where L denotes the maximum value of 



^(fe+i) 



To obtain this coefficient, let us 



consider two extremes. One is that any states are reachable at one step. By definition, 
R = 1 and Li « M'^NL'^/f], which yields p69p . The other case is that only one particle 
can move to the nearest neighbor point at one time step. With a (<C L) denoting the 
lattice spacing, we have 

Mr,. .91 MLa 

Since the number of steps to reach any configurations is estimated as R ~ NML/a, we 
again obtain (|169[) . 

5.3 Green's function Monte Carlo method 

The path-integral Monte Carlo simulates only the equilibrium behavior at finite temper- 
ature because its starting point is the equilibrium partition function. Moreover, it follows 
an artificial time evolution of Monte Carlo dynamics, not the natural Schrodinger dy- 
namics. An alternative approach to improve these points is the Green's function Monte 
Carlo (GFMC) method [ill US HSl gT] . The basic idea is to solve the imaginary-time 
Schrodinger equation by stochastic processes. In the present section we derive sufficient 
conditions for strong ergodicity in GFMC. 

The evolution of states by the imaginary-time Schrodinger equation starting from an 
initial state |'0o) is expressed as 

m)) = Texp (~ f dt'Hit')] lijo), (171) 
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where T is the time-ordering operator. The right-hand side can be decomposed into a 
product of small-time evolutions, 



= lim Goitn-l)Go{tn-2) ' ' ' Gq 1 )Go (^0 ) I V'o) , (172) 
n — >oo 

where tk = kAt, At — t/n and Go{t) = 1 — At-H{t). In the GFMC, one approximates the 
right-hand side of this equation by a product with large but finite n and replaces Go{t) 
with Gi{t) = 1 — At{H{t) — Et), where Et is called the reference energy to be taken 
approximately close to the final ground-state energy. This subtraction of the reference 
energy simply adjusts the standard of energy and changes nothing physically. However, 
practically, this term is important to keep the matrix elements positive and to accelerate 
convergence to the ground state as will be explained shortly. 

To realize the process of (|172p by a stochastic method, we rewrite this equation in a 
recursive form, 

"ipk+iiy) ^y^6i{y,x;tk)il'k{x), (173) 



where ipkix) = (ajjV'fc) and |a;) denotes a basis state. The matrix element of Green's 
function is given by 

Gi{y,x;t) - {y\l-At[H{t)-ET]\x). (174) 

Equation (|173p looks similar to a Markov process but is significantly different in sev- 
eral ways. An important difference is that the Green's function is not normalized, 
X]y Gi(?/, x;t) ^ I. In order to avoid this problem, one decomposes the Green's function 
into a normalized probability Gi and a weight w: 

Gi{y,x;t) = Gi{y,x;t)w{x;t), (175) 

where 

^/ ,x Gi{y,x;t) Gi{y,x;t) 

Gi{y,x;t) = -, w{x;t) = — -. (176) 

J2yGi{y,x;t) Gi{y,x-t) 

Thus, using p73p . the wave function at time t is written as 

V'n(y) = X! h,x„w{Xn-l\tn-l)w{Xn-2\tn-2) ' ' • w(xo;to) 

X Gl{Xn,Xn-l]tn-l)Gl{Xn-l,Xn-2\tn-2) ' ' ' Gl{xi,Xo] to)ll^o{xo). (177) 

The algorithm of GFMC is based on this formula and is defined by a weighted random 
walk in the following sense. One first prepares an arbitrary initial wave function V'o (2^0)7 
all elements of which are non-negative. A random walker is generated, which sits initially 
{t = to) at the position xq with a probability proportional to ipQ{xo). Then the walker 
moves to a new position xi following the transition probability Gi{xi, xo'jtg). Thus 
this probability should be chosen non-negative by choosing parameters appropriately 
as described later. Simultaneously, the weight of this walker is updated by the rule 
Wi — w{xo;to)Wo with Wo = 1. This stochastic process is repeated to t — tn~i- One 
actually prepares M independent walkers and let those walkers follow the above process. 
Then, according to (jl77p . the wave function ipn{y) is approximated by the distribution 
of walkers at the final step weighted by Wn, 

1 *^ 

^"(^)-Ji-LME^i^^^..«' (178) 



M^oo M 

1=1 



where i is the index of a walker. 
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As noted above, Gi{y,x;t) should be non-negative, which is achieved by choosing 
sufficiently small At {i.e. sufficiently large n) and selecting Et within the instantaneous 
spectrum of the Hamiltonian H{t). In particular, when Et is close to the instantaneous 
ground-state energy of H{t) for large t {i.e. the final target energy), Gi{x,x;t) is close 
to unity whereas other matrix components of Gi{t) are small. Thus, by choosing Et this 
way, one can accelerate convergence of GFMC to the optimal state in the last steps of 
the process. 

If we apply this general framework to the TFIM with the cr^-diagonal basis, the matrix 
elements of Green's function are immediately calculated as 

-Et] {x^y) 

Gi{y,x;t) ~ Atr(t) {x and y differ by a single-spin flip) (179) 

(otherwise), 

where Eq{x) = {x\ ■-^ij'^i'^j^ l'^)- C)ne should choose At and Et such that 1 — 

At{Eo{x) — Et) > for all x. Since w{x, t) = Gi(j/, x; t), the weight is given by 

w{x;t) = 1 - At[Eo{x) - Et] + NAtr{t). (180) 

One can decompose this transition probability into the generation probability and the 
acceptance probability as in (|112[) : 

P{y,^) = \f (181) 

I (otherwise) 

NAtr{t) , , 

''^^^^■^'^= l-AtiE,{x)-ET]+NAtT{ty ^'''^ 
We shall analyze the convergence properties of stochastic processes under these probabil- 
ities for TFIM. 

5.3.1 Convergence theorem for GFMC-QA 

Similarly to the QA by PIMC, it is necessary to reduce the strength of quantum fluc- 
tuations slowly enough in order to find the ground state in the GFMC. The following 
Theorem provides a sufficient condition in this regard. 

Theorem 5.6 (Strong ergodicity of QA-GFMC). The inhomogeneous Markov process 
of the random walker for the QA-GFMC of TFIM, HTWj) . UJT]) and fUD, is strongly 
ergodic if 

T{t) > , ^ , , 0<c<— . (183) 

The lower bound of the transition probability given in the following Lemma will be 
used in the proof of Theorem 15.61 

Lemma 5.7. The transition probability of random walk in the GFMC defined by 
il81\) and il82\) has the lower bound: 

P{y, x)>0^yt>0:G,{y, x; t) > , _ (^^^^^^ 'j^ ^ r(^) ' ^'''^ 

3t, > 0,Vt > : G,{x,x;t) > , _ ^j^Jl^^^^^ ^ ^^,j^^,y ^ 
where i?min is the minimum value of Eq{x) 

E^in=inm{Eo{x)\xeS}. (186) 
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Proof of Lemma 15.71 The first part of Lemma 15.71 is trivial because the transition 
probability is an increasing function with respect to Eq{x) when P(y,x) > as seen in 
P82|) . Next, we prove the second part of Lemma [5Jl According to (|179p and (jlSOp . 

Gi{x,x;t) is written as 



Gi{x, x;t) = 1 — 



NAtT{t) 



1 - At [Eo{x) - Et] + NAt T{t) ' 



(187) 



Since the transverse field T{t) decreases to zero with time, the second term on the right- 
hand side tends to zero as t ^ oo. Thus, there exists ti > such that Gi{x, x;t) > 1 — e 
for Ve > and Vt > ii. On the other hand, the right-hand side of (|185p converges to zero 
as t oo. We therefore have (|185p . □ 



Proof of Theorem 15.61 We show that the condition (|183p is sufficient to satisfy the 
three conditions of Theorem 15.21 

1. From Lemma l5.7( we obtain a bound on the coefhcient of ergodicity for sufficiently 
large k as 

N 

(188) 



1 — a(Gi ) > 



AtT{kN-l) 



1 - At [E^i^ - Et) + NAt r{kN - 1) 



in the same manner as we derived (jl52p . where we used R — N. Substituting the 
annealing schedule (|183p . we can prove weak ergodicity from Theorem 15.11 because 



kN,kN-N^ 



k=l 



> 



E 

k — kr\ 



{kN} 



:N 



(189) 



which diverges when < c < 1/A^. 

2. The stationary distribution of the instantaneous transition probability Gi{y,x;t) 

is 

_ w{x;t) 1 AtEo{x) 



q{x\t) 



Eooes^i^^'t) 2^ 2^[l + AtET + NAtr{t)]'' 



(190) 



which is derived as follows. The transition probability defined by pi2p . (|18ip and (I182p 
is rewritten in terms of the weight (|180p as 



Gi{y,x;t) 



^ NAtT{t) 

w{x; t) 
AtV{t) 

w{x; t) 




{x = y) 

{x E Sy \ single-spin flip) 



(191) 



Thus, we have 



E Gi{y,x;t)q{x;t) = 
xes 



1 - 



NAtT{t) 



w{y;t) 



(otherwise) . 

■w{y;t) 
A 



E 



Atr(t) w{x;t) 



= q{y;t) 

where A denotes the normalization factor 



xes, 

NAtT{t) AtT{t) 



A 



A 



w{x; t) 

El' 



A 



(192) 



w{x; t) = Tr 

xeS 



1-At |-^J.,a,^a|-i?, 



NAtT{t) 



[1 + AtET + NAtT{t)]. 



(193) 
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where we used Tr^ Jijafaj = 0. Since the volume oiSy is N, (jl92|) indicates that q{x; t) 
is the stationary distribution of Gi{y, x; t). The right-hand side of p90p is easily derived 
from the above equation. 

3. Since the transverse field T{t) decreases monotonically with t, the above stationary 
distribution q{x; t) is an increasing function of t if Eo{x) < and is decreasing if Eq > 0. 
Consequently, using the same procedure as in (|160p . we have 

\\qit + l)-qit)\\=2 J2 [qi^;t + l)-qix;t)], (194) 

Eo(x)<0 

and thus 

oo 

^||g(i+l)-g(<)|| =2 ['Z(a:;oo)-g(x;0)] <2. (195) 

t=0 Eo{x)<0 

Therefore the sum J^tZo W^i^ + 1) ^ 9(^)11 finite, which completes the proof of the 
condition 3. □ 



Remark. Theorem 15.61 asserts convergence of the distribution of random walkers to the 
equilibrium distribution (|190p with T{t) —^ 0. This implies that the final distribution 
is not delta-peaked at the ground state with minimum Eq{x) but is a relatively mild 
function of this energy. The optimality of the solution is achieved after one takes the 
weight factor w{x;t) into account: The repeated multiplication of weight factors as in 
(|177p . in conjunction with the relatively mild distribution coming from the product of 
Gi as mentioned above, leads to the asymptotically delta-peaked wave function ipn{y) 
because w{x; t) is larger for smaller Eo{x) as seen in (|180p . 



5.3.2 Alternative choice of Green's function 

So far we have used the Green's function defined in (|174p , which is linear in the transverse 
field, allowing single-spin flips only. It may be useful to consider another type of Green's 
function which accommodates multi-spin flips. Let us try the following form of Green's 
function. 




(196) 



which is equal to Go{t) to the order At. The matrix element of G2{t) in the a^-diagonal 
basis is 

G2(y,x;t) =cosh^(Atr(t))tanh*(Atr(t))e-'^*^°("), (197) 

where S is the number of spins in different states in x and y. According to the scheme 
of GFMC, we decompose 6*2(2/, a;;t) ii^to the normalized transition probability and the 
weight: 

N 

ta.nh\ At T{t)), (198) 

t(;2(x;t) =e^*~rWe-^*^«(-). (199) 

It is remarkable that the transition probability G2 is independent of Eq{x), although it 
depends on x through S. Thus, the stationary distribution of random walk is uniform. 
This property is lost if one interchanges the order of the two factors in (I196P . 
The property of strong ergodicity can be shown to hold in this case as well: 



G2iy,x;t) 



cosh(Air(i)) 
gAtr(t) 
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Theorem 5.8 (Strong ergodicity of QA-GFMC 2). The inhomogeneous Markov chain 
generated by il98\) is strongly ergodic if 



m>-^log(l-26(t + l)-i/^ 



Remark. For sufficiently large t, the above annealing schedule is reduced to 

r(^) > ' 



At (i + f)i/JV 



(200) 



(201) 



Since the proof is quite similar to the previous cases, we just outline the idea of the 
proof. The transition probability G2(y, x; t) becomes smallest when S = N. Consequently, 
the coefficient of ergodicity is estimated as 



l-a(G 



t+i,t\ 



> 



I _ Q-2Atr(t) 



N 



We note that R is equal to 1 in the present case because any states are reachable from 
an arbitrary state in a single step. From Theorem 15. 11 the condition 



1 



-2Atr(t) 



N 



> 



b' 

t~Ti 



(202) 



is sufficient for weak ergodicity. From this, one obtains (|200p . Since the stationary 
distribution of G2{y, x; t) is uniform as mentioned above, strong ergodicity readily follows 
from Theorem 15.21 

Similarly to the case of PIMC, we can discuss the convergence condition of QA- 
GFMC in systems with continuous degrees of freedom. The resulting sufficient condition 
is a logarithmic increase of the mass as will be shown now. The operator G2 generated 
by the Hamiltonian (|167p is written as 



G2{t) = exp 



At 



N 



-AtV{{r,}) 



Thus, the Green's function is calculated in a discretized space as 

N 



G2{y,x;t) cx exp 



2m{t) ^ 
I 

m{t) 
2At 



Y,\r'.-n\-AtV{{n}) 



(203) 



(204) 



where x and y represent {rt} and {r[}, respectively. Summation over y, i.e. integration 
over {r[}, yields the weight w(x;t), from which the transition probability is obtained: 



u;(a;;t) oce-^*^«'--», 
G2{y, x; t) oc exp I ^Wi- 



(205) 
(206) 



The lower bound for the transition probability depends exponentially on the mass: G2{y, x; t) > 
g-Cm(t)^ Since 1 — a(G2^^'*) has the same lower bound, the sufficient condition for weak 
ergodicity is e~*^™*-*^ > (t + 1)^^, which is rewritten as 



m{t) < G"Mog(t + 1). 



(207) 



The constant G is proportional to NL'^/At, where L denotes the maximum value of 
\r' — r\. The derivation of C is similar to (|169p . because G2{t) allows any transition to 
arbitrary states at one time step. 
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6 Summary and perspective 



In this paper we have studied the mathematical foundation of quantum anneahng, in 
particular the convergence conditions and the reduction of residual errors. In Sec. [2l we 
have seen that the adiabaticity condition of the quantum system representing quantum 
annealing leads to the convergence condition, i.e. the condition for the system to reach 
the solution of the classical optimization problem as i — *■ oo following the real-time 
Schrodinger equation. The result shows the asymptotic power decrease of the transverse 
field as the condition for convergence. This rate of decrease of the control parameter 
is faster than the logarithmic rate of temperature decrease for convergence of SA. It 
nevertheless does not mean the qualitative reduction of computational complexity from 
classical SA to QA. Our method deals with a very generic system that represents most of 
the interesting problems including worst instances of difficult problems, for which drastic 
reduction of computational complexity is hard to expect. 

Section [3] reviews the quantum-mechanical derivation of the convergence condition of 
SA using the classical-quantum mapping without an extra dimension in the quantum sys- 
tem. The adiabaticity condition for the quantum system has been shown to be equivalent 
to the quasi-equilibrium condition for the classical system at finite temperature, repro- 
ducing the well-known convergence condition of SA. The adiabaticity condition thus leads 
to the convergence condition of both QA and SA. Since the studies of QAE often exploits 
the adiabaticity condition to derive the computational complexity of a given problem, the 
adiabaticity may be seen as a versatile tool traversing QA, SA and QAE. 

Section |4] is for the reduction of residual errors after finite-time quantum evolution of 
real- and imaginary-time Schrodinger equations. This is a different point of view from 
the usual context of QAE, where the issue is to reduce the evolution time (computational 
complexity) with the residual error fixed to a given small value. It has been shown 
that the residual error can becomes significantly smaller by the ingenious choice of the 
time dependence of coefficients in the quantum Hamiltonian. This idea allows us to 
reduce the residual error for any given QAE-based algorithm without compromising the 
computational complexity apart from a possibly moderate numerical factor. 

In Sec. [5] we have derived the convergence condition of QA implemented by Quantum 
Monte Carlo simulations of path-integral and Green function methods. These approaches 
bear important practical significance because only stochastic methods allow us to treat 
practical large-size problems on the classical computer. A highly non-trivial result in 
this section is that the convergence condition for the stochastic methods is essentially 
the same power-law decrease of the transverse-field term as in the Schrodinger dynamics 
of Sec. [2] This is surprising since the Monte Carlo (stochastic) dynamics is completely 
different from the Schrodinger dynamics. Something deep may lie behind this coincidence 
and it should be an interesting target of future studies. 

The results presented and/or reviewed in this paper serve as the mathematical foun- 
dation of QA. We have also stressed the similarity /equivalence of QA and QAE. Even 
the classical SA can be viewed from the same framework of quantum adiabaticity as long 
as the convergence conditions are concerned. Since the studies of very generic properties 
of QA seem to have been almost completed, fruitful future developments would lie in the 
investigations of problems specific to each case of optimization task by analytical and 
numerical methods. 
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A Hopf 's inequality 

In this Appendix, we prove the inequality (P^ . Although Hopf [35] originally proved this 
inequality for positive linear integral operators, we concentrate on a square matrix for 
simplicity. 

Let M be a strictly positive m x m matrix. The strict positivity means that all the 
elements of M are positive, namely, Mij > for all i, j, which will be denoted by M > 0. 
Similarly, M > means that Mtj > for all i, j. We use the same notation for a vector, 
that is, f > means that all the elements Vi are positive. 

The product of the matrix M and an m-element column vector v is denoted as usual 
by Mv and its ith element is 

m 

(Mv), = M.jVj. (208) 

The strict positivity for M is equivalent to 

Mv > if v>0, v^O, (209) 

where denotes the zero-vector. Of course, if t> = 0, then Mv = 0. 

Any real-valued vector v and any strictly positive vector p > satisfy 

. V . {Mv), (Mv)., V, 

mm — < mm — < max — < max — , (210) 

i Pi t {Mp)i i [Mp).i I Pi 



because 



{Mv), ~ (^niin ) {Mp), = ^ M, 



max ^ {Mp), - {Mv), = ^ M,^ 



/ Vi 

max — 

V ' Pi 



- V, 



>0, (211) 
> 0. (212) 



The above inequality implies that the difference between maximum and minimum of 
{Mv)i/ {Mp)i is smaller than that of Vi/pi. Following [35|, we use the notation, 

Vi Vi , Vi 

osc — = max mm — , (213) 

i Pi I Pi i p, 

which is called the oscillation. For a complex-valued vector, we define 

oscvi = sup oscRe(?7tii). (214) 

\v\=i ' 

It is easily to derive, for any complex c, 

osc {cvi) = \c\ oscvi. (215) 

i i 

We can also easily prove that, if osci Vi — 0, Vi does not depend on i. 
We suppose that the simple ratio of matrix elements is bounded, 

< K for all i, j, k. (216) 



jk 
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This assumption is rewritten by the product form as 

{Mv 



[Mv) 



<K, v>0, v^O, (217) 



for all i, j and such v. The following Theorem states that the inequality (I210|) is sharpened 
under the above additional assumption (|217p . 

Theorem A.l. If M satisfies the conditions \209i) and \211l^ , for any p > and any 
complex-valued v, 

(Mv), K-1 V, 
osc- — < osc — . 218 

r {Mp), - K+1 i 

Proof. We consider a real- valued vector v at first. For fixed i, j and fixed p > 0, we 
define Xk by 

{Mv), {Mv), 



J^XkVk. (219) 



(Mp), (Mp), 

We do not have to know the exact form of Xk = Xk{i,j,p). When v = ap, the left-hand 
side of the above equation vanishes, which implies XkPk = 0. Thus, we have 

{Mv), {Mv), A , , 

Now we choose ^ 

a = min— , 6 = max— . (221) 

i Pi i Pi 

Since Vk — apk = {b — a)pk — {bpk — Vk) , Vk — apk takes its minimum at = apk and 
its maximum {b — a)pk at Vk — bpk- Therefore, the right-hand side of (j220p with p given 
attains its maximum for 

V = ap - bp+ =ap+{b- a)p+, (222) 

where we defined 

=\0 (X.>0)' -\p, {x,>o)- ^'''^ 

Consequently, we have 

'{Mp+), {Mp-^ 



{Mv), {Mv)j ^ 
{Mp), {Mp), - 



{b-a). (224) 



{Mp), {Mp), 

Since, by assumptions, M > and p > 0, we have 

Mp- > 0, Mp+ > 0, Mp = Mp- + Mp+ > 0. (225) 

Moreover, Mp- > if p ^ and Afp+ > if p+ ^ 0. In cither case, namely, p = 
or p+ = 0, the expression in the square brackets of (|224[) vanishes because p+ is equal to 
either p or 0. Thus, we may assume that both Mp- > and Mp+ > 0. Therefore the 
expression in inequality ()224p is rewritten as 

{Mp+), {Mp+), 1 l_ {Mp-), _ {Mp-), 

{Mp), {Mp), -1-t-t l-ff - {Mp+),' ~ {Mp+),- ^ 
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Since, from the assumption ()217p . t and t' are bounded from k ^ to k, we find t' < nt^ , 
which yields 

{Mp+), _ {Mp+), ^ ^ 1_ 

{Mp), {Mp)j - \ + t \ + kH' ^ ' 

For t > 0, the right-hand side of the above inequahty takes its maximum value [n — 
1)/{k + 1) at t = Finally, we obtain 

_ (M!!)l< osc ^ (228) 

(Mp), iMp)j - K+1 ^ p, ^ ^ 

for any i, j. Hence it holds for the sup of the left-hand side, which yields ()218p . 

For a complex- valued vector v, we replace Vi by Re {rjVi). Since MRe {rjv) — Re (jjMv), 
the same argument for the real vector case yields 

oscRe (v^^) < ^oscRe f,^) . (229) 

Taking the sup with respect to 77, |?7| = 1, on both sides, we obtain (|218p . □ 

We apply this Theorem to the eigenvalue problem, 

Mv = Xv. (230) 

The Perron-Frobenius theorem states that a non-negative square matrix, M > 0, has a 
real eigenvalue Xq satisfying |A| < Aq for any other eigenvalue A. This result is sharpened 
for a strictly positive matrix, M > 0, as the following Theorems. 

Theorem A. 2. Under the hypotheses \209\) and \21'1^ , the eigenvalue equation \23(]\) has 
a positive solution X = Xq > 0, v = q > 0. Moreover, for any vector p (p > 0, p ^ 0), 
the sequence 

M"p , , 

with k fixed, converges toward such q. 

Theorem A. 3. Under the same hypotheses, I12S0\) has no other non-negative solutions 
than X ^ Xq, V ~ cq. For A = Aq, h23U\) has no other solutions than v ^ cq. 

Theorem A. 4. Under the same hypotheses, any (complex) eigenvalue X ^ Xq of \230\) 
satisfies 

\X\ < ^Ao. (232) 

Remark. We note that the factor (k— 1)/(k + 1) is the best possible if there is no further 
condition. For example. 

M = ( M , K > 0, (233) 



has eigenvalues Aq = k + 1 and X ~ n — 1. 

Proof of Theorem IA.21 Let us consider two vectors p, p which are non-negative and 
unequal to 0, and define 

Pn+i = Mp„, p„+i = Mp„, po = p, po = p. (234) 

From the hypothesis (j209p . both p„ and p„ are strictly positive for n > 0. We find by 
repeated applications of Theorem I A . 1 1 that . for n > 1, 

Pn,i ^ f'^^'^y ^ Pl-i /ogc\ 

osc — - < osc — -, (235) 

i Pn,i \K.+ 1J I 
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where we used the notation pn^i ~ {Pn)i- Consequently, there exists a finite constant 
A > 0, such that 

^ — > A (n — > oo) (236) 

for every i. We normalize the vectors p^, as 

Qn = ^, = (237) 

PrL,k Pn^k 

with k fixed. The hypothesis (|217p imphes that 

< q-n.i < K, < < 't- (238) 

Thus, we find that 



I _ Pn^ 
Pn,k 



Pn,i Pn,k 

Pn,i Pn,k 



, Pn,k Pn,i /r<on^ 
<K^^OSC -. (239) 



Pn,k * J'n,: 

Now we specialize to the case that p = Alp = Pi , namely, 

Pn = Mp„ = q„ = q„+i. (240) 

Using (|235p and (|236p . we estimate (|239p for ^ 9n.i7 which implies that the sequence 

q„ converges to a limit vector q. Because of (|238p . we have q > 0. Now (|236p reads 

Pn+hr {Mpji {Mq.J, 

Consequently, Mqr = Ao^. For any other initial vector p, the sequence q„ converges to the 
same limit as q„ because of p35p . ()236|) and (j239p . Theorem lA. 21 is thereby proved. □ 

Proof of Theorem IA.3i We assume that d > 0, 7^ is a solution of the eigenvalue 
equation ()230|) . Since the hypothesis (j209p implies Mv > 0, we have A > and v > 0. We 
use this V as an initial vector p in Theorem IA.2I and apply the last part of this Theorem 
to 

= — . (242) 

{M^v)k XnVk Vk 

Hence, the limit q is equal to v/vk, that is, v = cq, and A = Aq. Therefore the first part 
of Theorem lA. 31 is proved. 

Next, we take Aq > and q > from Theorem IA.2I and consider a solution of 
Mv = At). The application of Theorem lA.ll to q and v yields 

|A| V, Xv, {Mv)i ^ K-l Vi 

— osc — = osc = osc , < osc — , (243) 

Aq i qi i AoQi i {Mq)i K+1 I 

where we used (|215p . If A = Aq, the above inequality implies that osc^ Vi / qi = or v = cq, 
which provides the second part of Theorem IA.3I □ 

Proof of Theorem IA.41 We consider (I243p . If A 7^ Aq and v 0, v can not be equal 
to cq. Therefore osavi/ qi > 0, and then (piSj) yields (f2^ . □ 



B Conditions for ergodicity 

In this Appendix, we prove Theorems 15.11 and 15.21 which provide conditions for weak and 
strong ergodicity of an inhomogcneous Markov chain [S|. 
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B.l CoefRcient of ergodicity 

Let us recall the definition of the coeflicient of ergodicity 



a{G) = 1— min < > mm{G{z,x),G{z,y)} > 



First, we prove that this coefficient is rewritten as 



Thus, we find 



(244) 



.(G) = i max I J2 \G{z, x) - G{z, y)\\ . (245) 



(246) 



Proof of (|245p . For fixed x,y G S, we define two subsets of S by 

S^ = {zeS\ G{z, x) - G{z, y) > 0}, 
5g = {z e 5 I G(z, x) - G{z, y) < 0}. 

Since the transition matrix satisfies J2yes ^iv^ x) — 1, we have 

J2 {G{z, x) - G{z, y)] = [l - E G{Z' ^)] - - E y) 

= -J2[Giz,x)-G{z,y)]. (247) 



i ^ |G(z, :r) - G(z, y)| = ^ [G(2, a.) - G(z, y)] 

= ^ max {0, G(z, x) - G(z, y)} 

= X! t'^^^' ^) ~ niin{G(z, x), G{z, y)}] 
zes 

= l-^min{G(z,a:),G(z,j/)}, (248) 

zes 

for any x, y. Hence taking the max with respect to x, y on both sides, we obtain (I245|) . □ 

To derive the conditions for weak and strong ergodicity, the following Lemmas are 
useful. 

Lemma B.l. Let G be a transition matrix. Then the coefficient of ergodicity satisfies 

< a(G) < 1. (249) 

Lemma B.2. Let G and H be transition matrices on S . Then the coefficient of ergodicity 
satisfies 

a(GH) < a{G)a{H). (250) 
Lemma B.3. Let G be a transition matrix and H be a square matrix on S such that 

Y,H{z,x)^0, (251) 
zes 
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for any x £ S. Then we have 

\\GH\\ < a{G)\\Hl 
where the norm of a square matrix defined by 



(252) 



(253) 



Proof of Lemma IB.li The definition of a{G) implies a{G) < 1 because G{y,x) > 0. 
From (Pi5)) . a{G) > is straightforward. □ 

Proof of Lemma IB.2i Let us consider a transition matrix G, a column vector a such 
that J2zgS "('^) ~ their product b — Ga. We note that the vector 6 satisfies 

J2zes ^(•^) ^ ^ because 



E^(^'y) 



We define subsets of S by 

5+ = {z e 5 I a(z) > 0}, S- ={zcS \ a{z) < 0}, 
5+ = {z e 5 I 6(z) > 0}, {z e 5 I b{z) < 0}. 

Since X^^es a{z) = J^zes K^) = 0' find 

E = E "(^) - E '^(^) = 2 E "(^) = -2 ^ «(^)' 



ze5 



2 65^ 



zes; 



zesz 



^|5(z)| = 2 ^ 6(z) = -2 ^ b{z). 



(254) 

(255) 

(256) 
(257) 



zes 



Z£S;, 



Therefore, we obtain 

^|&(^)| = 2 ^ J2Giz,u)aiu) 



2E 

ues+ 



E ^(^'") 



aiu) + 2 ^ 



E G(^- 



a(7i) 



<2max^ E Giz,v)\ ^ a(^.) + 2 mm <j ^ Giz,w)\ ^ a(zi) 



= max <^ V [G{z, v) - G(z, t«)] i V |a(w)| 

< max < max {0, G'(z, u) — G{z, w)} > |a(w)| 

= ^ ™ax i V \G{z,v) ~ G{z,w)\ \ V \a{u)\ 
Ue5 ) u£S 



^aiG)J2\aiu)\., 
ues 



(258) 
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where we used ([248]) and ([2451) . 

Next, we consider transition matrices G,H and F = GH. We take a{z) — H{z,x) — 
H{z,y), and then (|258p is rewritten as 

Y,\F{z,x) - F{z,y)\ < aiG)Y,\Hiu,x) ^ Hiu,y)l (259) 

for any x,y. Hence this inequality holds for the max of both sides with respect to x,y, 
which yields Lemma [8.21 □ 



Proof of Lemma IB.3i Let us consider F = GH. We can take a{z) = H{z, x) in (|258p 
because of the assumption X]i;g5 ^iv^ ^) ~ ^- Thus we have 

J2\F{z,x)\<a{G)J2\Hiu,x)\, (260) 

for any x. Hence this inequality holds for the max of both sides with respect to x, which 
provides Lemma [8.31 □ 



B.2 Conditions for weak ergodicity 

The following Theorem provides the reason why a{G) is called the coefficient of ergodicity. 

Theorem B.4. An inhomogeneous Markov chain is weakly ergodic if and only if the 
transition matrix satisfies 

lim a (G*'') = (261) 

t—*oo 

for any s > 0. 

Proof. We assume that the inhomogeneous Markov chain generated by G{t) is weakly 
ergodic. For fixed x,y d S, we define probability distributions by 

pAz)-i^ ^'^"^^ , P.W-I' ^'^^^ . (262) 
I (otherwise) I (otherwise) 

Since Px{t, s; z) — J2ues G'^'^'i^i u)px{u) — G*'''{z, x) and Py{t, s; z) — G*'''{z, y), we have 
^ |G*'^(^, x) - G''%z, y)\=Yl \P-^*' ^' - Pv'^*^ 

<snp{\\p{t,s)~p'{t,s)\\ \po,PoeV}. (263) 
Taking the max with respect to x,y on the left-hand side, we obtain 

2a{G'^')<snp{\\p{t,s)-p'{t,s)\\ \po,p'o^V}. (264) 

Therefore the definition of weak ergodicity (|119p yields (|261|) . 

We assume (|26ip . For fixed po,qo & 'P, we define the transition probabilities by 

H = {pa,qo, ■ ■ ■ ,(?o), (265) 
F ^ G'''H = ip{t, s),q{t, s), • • • , q{t, s)), (266) 

where p{t,s) — G*'''po, q{t,s) — G*^^go- From (|245p . the coefficient of ergodicity for F is 
rewritten as 

= ^I]|p(i:s;z)-g(t,s;z)| = ^\\p{t,s) ~ q{t,s)\\. (267) 
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Thus Lemmas ET] and MM yield 

\\p{t, s) - q{t, 5)11 < 2«(G*'^)a(F) < 2a(G*'^). (268) 
Taking the sup with respect to pQ,qQ (z S and the limit t oo, we obtain 

lim sup{||p(i, s) - q{t, s)\\ \ po, go e V} < 2 lim a{G''') = 0, (269) 

t — ^oo t — *oo 

for any s > 0. Therefore the inhomogeneous Markov chain generated by G{t) is weakly 
ergodic. □ 

Next, we prove Theorem 15. II For this purpose, the following Lemma is useful. 
Lemma B.5. Let Aq, ai, • • • , a„, • • • be a sequence such that < < 1 for any i. 

oo oo 

J2a^^oo =^ J|(l-a.) = 0. (270) 

i—0 i—n 

Proof. Since < 1 — < e~^% we have 

m in / m \ 

0< n(l-«') < n^"°' <exp -^aj . (271) 

i—n i—n \ i—n / 

In the limit m — > cx), the right-hand side converges to zero because of the assumption 
Si^o o^i — CO. Therefore we obtain (|270p . □ 

Proof of Theorem 15.11 We assume that the inhomogeneous Markov chain generated 
by G{t) is weakly ergodic. Theorem IB .41 yields 

lim [l - a(G*''')] = 1 (272) 

t^oo 

for any s > 0. Thus, there exists ti such that 1 — Q!(G*1'*") > 1/2 with io — s. Similarly, 
there exists tn+i such that 1 — Q!(G*"+1'*") > 1/2 for any t„ > 0. Therefore, 

n 

^[l-a(G*'--*0] > 2 (" + !)• (273) 

1=0 

Taking the limit n ^ oo, we obtain (|122p . 
We assume (I122p . Lemma FB . 5 1 yields 

oo oo 

J|{1- [l-a(G*-+i'*')]} = n^^*^*"^''*") = 0- (274) 

i—n i—n 

For fixed s and t such that t > s > 0, we define n and m by t„_i < s < tn, tm < t < tm+i- 
Thus, from Lemma lB.21 we obtain 

a(G*'") <a(G*'*'")a(G*'-*"-i)---a(G*"+i'*")a(G*"'^) 



= a(G* 



a(G*"^'). (275) 



In the limit t oo, m goes to infinity and then the right-hand side converges to zero 
because of (|274p . Thus we have 

lim a(G*'') = 0, (276) 

t — >oo 

for any s. Therefore, from Thcorcm lB.4l the inhomogeneous Markov chain generated by 
G(t) is weakly ergodic. □ 
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B.3 Conditions for strong ergodicity 

The goal of this section is to give the proof of Theorem 15.21 Before that, we prove the 
foUowing Theorem, which also provides the sufficient condition for strong ergodicity. 

Theorem B.6. An inhomogeneous Markov chain generated by G{t) is strongly ergodic if 
there exists the transition matrix H on S such that H{z, x) = H{z, y) for any x,y,z £ S 
and 

-^t,s tjW _ r, (^277) 



lim G"'" -H\\=Q 



for any s > 0. 



Proof. We consider po G V and p{t, s) = G^'^po- For fixed u G S, we define a probability 
distribution r by r(z) = H{z,u). We find 



= E E [G'-%z,x)-Hiz,u)]poix) 

z^S xES 

^ E E - H{z, u)\^Y.J2 - Hiz, x)\ 



z£S x^S z^S x^S 

< ||G*'" - H\\ = \S\ ||G*'" - H\\ . 



(278) 



Taking the sup with respect to po G ^ BLiid using the assumption (|277p . we obtain 

lim sup {||p(t, s) - r\\ \poeV} = 0. (279) 

t — ^oo 

Therefore, the inhomogeneous Markov chain generated by G{t) is strongly ergodic. □ 

Proof of Theorem 15.21 We assume that the three conditions in Theorem 15.21 hold. 
Since the condition 3 is rewritten as 



x£S t=o 



we have 



(280) 



(281) 



for any x £ S. Thus, the stationary state pt converges to p = linit^ooPt- Now, let 
us define a transition matrices H and H(t) by H{z,x) = p{z) and H{z,x]t) — pt{z), 
respectively. For t > u > s >Q, 



\G^'' - H\\ < ||G*'"G"'^ - G*'"i/(u)|| 

+ ||G*'"i/(it) - H{t - 1)11 + \\H{t -1)-H\ 



(282) 



Thus, we evaluate each term on the right-hand side and show that (|277p holds. 
[1st term] Lemma IB. 31 vields that 

\I^Qt,nQu,s _ G-t,"^(u)|| < Q,(Gt.«) IIG"'" - H{u)\\ 

< 2a(G*'''), 



(283) 
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where we used HG"'* — H{u)\\ < 2. Since the Markov chain is weakly ergodic (condition 
1), Theorem IB .41 imphes that 



Ve > 0,3ti >Oyt>ti: G*'"G"'" - G*'"i?(?i) < 



(284) 



[2nd term] Since pt — G{t)pt (condition 2), we find 

H{u) = G{u)H{u) = G''+^'''H{u) (285) 

and then 

G^ ''H{u) = G*'"+iiy(u) = G*'"+i [H{u) - H{u + 1)] + G*'"+iiy(M + 1). (286) 
The last term on the right-hand side of the above equation is similarly rewritten as 



G*'"+ii7(w + 1) = G*^"+2 [H{u + 1) - H{u + 2)] + G^'^+^Hiu + 2). 
We recursively apply these relations and obtain 

G'-'^Hiu) = [^(^) - + 1)] + G'''-^H{t - 1) 

v=u 
t-2 

= ^ G*'"+i [i/(t;) - i?(t; + 1)] + H{t - 1). 



(287) 



(288) 



Thus the second term in (|282p is rewritten as 

t-2 



G*'"i7(w) - ff(t- 1) 



^ G*'"+Mi?(«) - i?(v + 1)] 



V — U 

t-2 



<^||G*-"+i [i^(^;)-i/(^;+l)]| 



Lemmas IB.ll and IB.3I yield that 

llG^'^'+i [i?(«)-i/(i; + l)]|| < ||i/(z;)-i/(t; + l)|| = 
where we used the definition of H{t). Thus we obtain 



t-2 



|G*'"if(M)-i/(t-l)|| < 



(289) 



(290) 



(291) 



Since X^t^o lb* ~ < (condition 3), for all e > 0, there exists ^2 > such that 



t-2 



Therefore 



Ve > 0,3t2 > 0,Vt > Vii > ^2 : G''"iJ(M) - H{t - 1) < - 



[3rd term] From the definitions of H and i?(t), they clearly satisfy 

lim \\H{t)-H\\ = 0, 

t — '■OO 



(292) 
(293) 
(294) 
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which impHes that 

Ve>0,3t3 >0,Vt>t3 : \\H{t^l)~H\\ < |. 



(295) 



Consequently, substitution of ((284| . ((2931) and (pM)) into (|282)) yields that 

\\G'^'-H\\<'-+'- + '-<s, (296) 

for all t > niaxj^i, t2, t^}. Since e is arbitrarily small, (|277p holds for any s > and then 
the given Markov chain is strongly ergodic from Theorem IB. 61 which completes the proof 
of the first part of Theorem 15.21 

Next, we assume p = limj^ooPt- For any distribution qq, we have Hqo — p because 

J2 H{z, x)qo{x) = p{z) J2 Qo{x) = p{z). (297) 

Thus, we obtain 

\\q{t,to)-p\\ - \\{G*^'" ~ H) qo\\ < ||G*^*«-F||. (298) 

Hence it holds for the sup with respect to qo E V, which yields p2ip in the limit of 
t — > oo. Theorem 15.21 is thereby proved. □ 
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